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Cover Material, Copyright, and 
License 


Copyright 2022 Mark Watson. All rights reserved. 


This Book is Licensed with Creative Commons 
Attribution CC BY Version 3 That Allows Reuse In 
Derived Works 


You are free to: 


+ Share — copy and redistribute the material in any medium or format 
+ Adapt — remix, transform, and build upon the material for any purpose, even commercially. 


You are required to give appropriate credit in any derived works: 


This work is derived from all or part of "Artificial Intelligence Using Swift" by 


Mark Watson. Source: https://leanpub.com/lovinglisp 


This eBook will be updated occasionally so please periodically check the leanpub.com web page for 
this book’ for updates. 


This is the first edition released spring of 2022. 
Please visit the author’s website’. 


If you found a copy of this book on the web and find it of value then please consider buying a copy 
at leanpub.com/SwiftAI’ to support the author and fund work for future updates. You can also see 
all of my books on my website https://markwatson.com/#books*. 





*https://leanpub.com/SwiftAl 
*http://markwatson.com 
*https://leanpub.com/SwiftAI 
*https://markwatson.com/#books 


Preface 


Why use Swift for hacking AI? Common Lisp has been my go-to language for artificial intelligence 
development and research since 1982. The transition to using Swift was a slow transition for me. 
During this transition I prototyped a new project in parallel using both Swift and Common Lisp, 
weighing the advantages of both for my current requirements. The Swift version of this project 
included in this book runs on macOS, iOS, and iPadOS. The macOS version is available on the 
Apple Store. Several of the utilities developed in this book were used in this project. 


This book starts out slowly with simple examples which I wrote showing how to access the Swift 
library packages on GitHub, tips on writing Swift command line apps, and web scraping. We then 
proceed to using Apple’s CoreML for Natural Language Processing (NLP), training and using your 
own CoreML models, using OpenAl’s GPT-3 APIs, and finally several semantic web/linked data 
examples. The book ends with the example KGN on the App Store’. It is not my intention to cover 
in detail the use of SwiftUI for building iOS/iPadOS/macOS applications but I thought my readers 
might enjoy seeing several of the techniques covered in the book integrated into an example app. 


I have used Common Lisp for AI research projects and for AI product development and delivery 
since 1982. There is something special about using a language for almost forty years. All that said, I 
find Swift a compelling choice now for several reasons: 


+ Flexible language with many features I rely on like supporting closures and an interactive 
functional programming style. 

¢ Built in support for deep learning neural network models for natural language processing, 
predictive models, etc. 

First class support for iOS and macOS development. 

* Good support for server side applications hosted on Linux. 


Swift is a programmer-efficient language: code is concise and easy to read, and high quality libraries 
from Apple and third parties mean that often there is less code to write. I will share with you 
my Swift development work flow that combines interactive development of code in playgrounds, 
development of higher level libraries in text only or command line applications, and my general 
strategy for writing iOS and macOS applications after low level and intermediate code is written 
and debugged. 





*https://apps.apple.com/us/app/kgn/id1514197947?mt=12 
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Parts of this Book are Specific for macOS and iOS, with 
Some Support for Linux 


Swift is a general purpose language that is well supported in macOS, iOS, and Linux, with some 
support in Windows. Here, we cover the use of Swift on macOS and iOS. Some of the examples in 
this book rely on libraries that are specifically available on macOS and iOS like CoreML and the 
NLP libraries. Several book examples also work on Linux, such as the examples using SQLite, the 
Microsoft Azure search APIs, web scraping, and semantic web/linked data. 


Code for this Book 


Because of the way the Swift Package Manager works, I organized all book examples that build 
libraries as separate GitHub repos so the libraries can be easily used in other book examples as well 
as your own software projects. The separate library GitHub repositories are: 


+ https://github.com/mark-watson/SparqlQuery_swift® - SPAROL Swift library for my Swift AI 
book. 

« https://github.com/mark-watson/QuestionAnswering BERT_swift’ - modification of Apple’s 
question answering demo to use DBPedia. 

« https://github.com/mark-watson/swift-coreml-wisconsin_data_create_model*® - create CoreML 
models from training data files of Wisconsin Caner data. 

« https://github.com/mark-watson/swift-coreml-wisconsin_data_predict_with_model’ - use the 
pretrained Wisconsin Cancer data model. 

¢ https://github.com/mark-watson/ShellProcess_swift*® - library for spawning shell processes 
and capturing output to stdout. 

¢ https://github.com/mark-watson/WebScraping_ swift’ - library for scrapping web sites. 

+ https://github.com/mark-watson/OpenAI_swift” - library for using OpenAl’s GPT3 APIs. 

¢ https://github.com/mark-watson/NIp_swift” - library that uses pretrained CoreML NLP mod- 
els. 

¢ https://github.com/mark-watson/KGN™ - SwiftUI based application supporting macOS, iPa- 
dOS, and iOS. The macOS version is in Apple’s app store. 


I suggest cloning all of these GitHub repositories right now so you can have the example source code 
at hand while reading this book. 





Shttps://github.com/mark-watson/SparqlQuery_swift 
“https://github.com/mark-watson/QuestionAnswering_BERT_swift 
®https://github.com/mark-watson/swift-coreml-wisconsin_data_create_model 
*https://github.com/mark- watson/swift-coreml-wisconsin_data_predict_with_model 
Mhttps://github.com/mark-watson/ShellProcess_swift 
“https://github.com/mark-watson/WebScraping_swift 
“https://github.com/mark-watson/OpenAL swift 
*https://github.com/mark-watson/NIp_swift 

“https://github.com/mark-watson/KGN 
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All of the code examples are licensed using the Apache 2 license. You are free to reuse the book 
example code in your own projects (open source, commercial), with attribution of my copyright and 
the Apache 2 license. 


Except for the last SwiftUI example application, all sample programs are written as command line 
utilities. I considered using Swift playgrounds for some of the examples but decided that packaging 
as a combination of libraries and command line utilities would tend to make the example code more 
useful for your own projects. 


http://www.knowledgegraphnavigator.com/ 
Author's Background 


I live in Sedona, Arizona with my wife and pet parrot. Our children and grandchildren live in 
California, Rhode Island, and the state of Washington. 


I have written 20+ books, mostly about artificial intelligence. I have over 50 US patents. 


I write about technologies that I have used throughout my career: knowledge representation using 
semantic web and linked data, machine learning and deep learning, and natural language processing. 
lam grateful for the companies where I have worked (SAIC, Google, Capital One, Olive AI, Babylist, 
etc.) that have supported this work since 1982. 


As an author, I hope that the material in this book entertains you and will be useful in your work. 
A Request from the Author 


I spent time writing this book to help you, dear reader. I release this book under the Creative 
Commons license and set the minimum purchase price to Free in order to reach the most readers. If 
you found this book on the web (or it was given to you) and if it provides value to you then please 
consider doing the following to support my future writing efforts and also to support future updates 
to this book: 


« Purchase a copy of this book” or any other of my leanpub books at https://leanpub.com/u/markwatson’® 


I enjoy writing and your support helps me write new editions and updates for my books and to 
develop new book projects. Thank you! 


Cover Art 


The cover picture was taken by WikiMedia Commons user Keta”’ and is available for use under the 
Creative Commons License CC BY-SA 2.5. 


https://leanpub.com/SwiftAl 
**https://leanpub.com/u/markwatson 
https://commons.wikimedia.org/wiki/User:Keta 
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CoreML Libraries Used in this Book 


+ CoreML general overview: https://developer.apple.com/documentation/coreml 

« MLClassifier https://developer.apple.com/documentation/createml/mlclassifier 

+ MLTextClassifier https://developer.apple.com/documentation/createm]/mltextclassifier 

* NLModel https://developer.apple.com/documentation/naturallanguage/nlmodel 

« Natural Language Framework https://developer.apple.com/documentation/naturallanguage 
¢ MLCustomLayer https://developer.apple.com/documentation/coreml/mlcustomlayer 


Swift 3rd Party Libraries 


We use the following 3rd party libraries: 


¢ https://github.com/SwiftyJSON/SwiftyJSON* 


Acknowledgements 


I thank my wife Carol for editing this manuscript, finding typos, and suggesting improvements. 





*https://github.com/SwiftyJSON/SwiftyJSON 


Part 1: Introduction and Short 
Examples 


We begin with a sufficient introduction for Swift to understand the programming examples. After 
introducing the language we will look at a few short examples that provide code and techniques we 


use later in the book: 


+ Creating Swift projects 
* Writing command line utilities 
+ Web scraping 


Setting Up Swift for Command Line 
Development 


Except for the last chapter in this book that uses Xcode for developing a complete macOS/iOS/iPadOS 
example application, I assume that you will work through the book examples using the command 
line and your favorite editor. If you want to use Xcode for the command line examples, you can 
open the Swift package file on the command line and open Xcode using, for example: 


cd SparqlQuery_swift 
open Package. swift 


You notice that most of the examples are command line apps or libraries with command line test 
programs and the README.md files in the example directories provide instructions for building 
and running on the command line. 


You can also run Xcode and from the File Menu open an example Package.swift file. You can then 
use the Product / Test menu to run the test code for the example. You might need to use the View / 
Debug Area / Active Console menu to show the output area. 


I assume that you are familiar with the Swift programming language and Xcode. 


Swift is a general purpose language that is well supported in macOS and iOS, with good support for 
Linux, and with some support in Windows. For the purposes of this book, we are only considering 
the use of Swift on macOS and iOS. Most of the examples in this book rely on libraries that are 
specifically available on macOS and iOS like CoreML and the NLP libraries. 


There are great free resources for the Swift language on the web, in other commercial books, and 
Apple’s free Swift books. Here I provide just enough material on the Swift language for you to 
understand and work with the book examples. After working through this book’s material you will 
be able to add machine learning, natural language processing, and knowledge representation to your 
applications. There will be parts of the Swift language that we don’t need for the material here, and 
we won’t cover. 


Installing Swift Packages 


We will use the Swift Package Manager’. You should pause reading now and install the Swift 
Package Manager if you have not already done so. 





*https://swift.org/package-manager/ 


Setting Up Swift for Command Line Development 8 


I occasionally use https://vapor.codes web framework” library (although not in this book). We use 
this 3rd party library as an example for building a library locally from source code. Start by cloning 
the git repository https://github.com/vapor/vapor™’. Then: 


git clone https: //github.com/vapor/vapor .git 
cd vapor 
swift build 


I don’t usually install libraries locally from source code unless I am curious about the implementation 
and want to read through the source code. Later we will see how to reference Swift libraries hosted 
on GitHub in a project’s Package.swift file. 


Creating Swift Packages 


We will cover using the Swift Package Manager to create new packages using the command line here. 
Later we will create projects using Apple’s XCode IDE when we develop the example application 
Knowledge Graph Navigator. 


You will want to use the Swift Package Manager documentation” for reference. 


We will be generating executable projects and library (with a sample main program) projects. The 
commands for generating the stub for an executable application project are: 


mkdir BingsSearch 
cd BingSearch 
swift package init --type executable 


and build the stub of a library with a demo main program: 


mkdir SparqlQuery 
cd SparqlQuery 
swift package init --type library 


Accessing Libraries that You Write in Other Projects 


You can reference Swift libraries using the Swift.package file for each of your projects. We will 
look at parts of two Swift.package files here. The first is for my SPARQL query client library 
that we will develop in a later chapter. This library SparqlQuery_swift is used in both book 
examples Knowledge Graph Navigator (KGN) macOS/iOS/iPadOS example application as well as a 
text version KnowledgeGraphNavigator_swift. 





*°https://vapor.codes 
**https://github.com/vapor/vapor 
*https://github.com/apple/swift-package-manager/blob/main/Documentation/Usage.md 
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import PackageDescription 


let 


package = Package( 
name: "SparqlQuery_swift", 
products: [ 
. library( 
name: "SparqlQuery_swift", 
targets: ["SparqlQuery_swift"]), 
iF 
dependencies: [ 
.package(url: "https://github.com/Swi ftyJSON/SwiftyJSON.git", 
.branch('"master")), 
l, 
targets: [ 
. target( 
name: "SparqlQuery_swift", 
dependencies: ["SwiftyJSON"]), 
. testTarget( 
name: "SparqlQuery_swiftTests", 
dependencies: ["SparqlQuery_swift", "SwiftyJSON"]), 


The Swift.package file for text version KnowledgeGraphNavigator_swift is shown here: 


import PackageDescription 


let 


package = Package( 
name: "KnowledgeGraphNavigator_swift", 
platforms: [ 
.macOS(.v1@_15), 
Ie 
dependencies: [ 
.package(url: "https://github.com/Swi ftyJSON/Swi ftyJSON.git", 
.branch("master")), 
.package(url: "https://github.com/scinfu/SwiftSoup.git", from: "1.7.4"), 
.package(url: "git@github.com:mark-watson/SparqlQuery_swift.git", 
.branch("main")), 
.package(url: "git@github.com:mark-watson/Nlp_swift.git", .branch("main")), 
I, 
targets: [ 
// Targets are the basic building blocks of a package. 
// A target can define a module or a test suite. 


19 
20 
21 
22 
23 
24 
29 
26 
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// Targets can depend on other targets in this package, 
// and on products in packages this package depends on. 
. target( 
name: "KnowledgeGraphNavigator_swift", 
dependencies: ["SparglQuery_swift", "Nlp_swift", 
"SwiftyJSON", "SwiftSoup"]), 


Hopefully you have cloned the git repositories for each book example and understand how I have 
configured the examples for your use. 


For the rest of this book, you can read chapters in any order. In some cases, earlier chapters will 
contain implementations of libraries used in later chapters. 


PF WN & 
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Background Information for Writing 
Swift Command Line Utilities 


This short chapter contains example code and utilities for writing command line programs, using 
external shell processes, and using the FileIO library. 


Using Shell Processes 


The library for using shell processes is one of my GitHub projects so you can include it in other 
projects using: 


dependencies: [ 
.package(url: "git@github.com:mark-watson/ShellProcess_swift.git", 
.branch("main")), 


J, 
You can clone this repository if you want to have the source code at hand: 
git clone https://github.com/mark-watson/ShellProcess_swift.git 


The following listing shows the library implementation. In line 5 we use the constructor Process 
from the Apple Foundation library to get a new process object that we set fields executableURL 
and argList. In lines 8 and 9 we create a new Unix style pipe to capture the output from the shell 
process we are starting and attach it to the process. After we run the task, we capture the output 
and return it as the value of function run_in_shell. 


import Foundation 


@available(OSX 10.13, *) 
public func run_in_shell(commandPath: String, argList: [String] = []) -> String { 
let task = Process() 
task.executableURL = URL(fileURLWithPath: commandPath) 
task.arguments = argList 
let pipe = Pipe() 
task.standardOutput = pipe 
do { 


Oo aN OO FF WN KE 
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try! task.run() 
let data = pipe. fileHandleForReading.readDataToEndOfFile() 
let output: String? = String(data: data, encoding: String.Encoding.utf8) 
if let output = output { 
if !output.isEmpty { 
return output.trimmingCharacters(in: .whitespacesAndNewlines) 


} 


return 


As in most examples in this book we use the Swift testing framework to run the example code at the 
command line using swift test. Running swift test does an implicit swift build. 


import XCTest 
@testable import ShellProcess_swift 


final class ShellProcessTests: XCTestCase { 

func testExample() { 
// This is an example of a functional test case. 
// Use XCTAssert and related functions to verify your tests produce the 
// correct results. 
print ("#* s4:") 
let st = run_in_shell(commandPath: "/bin/ps", argList: ["a"]) 
print(s1) 
let s2 = run_in_shell(commandPath: "/bin/ls", argList: ["."]) 
print("** s2:") 
print(s2) 
let s3 = run_in_shell(commandPath: "/bin/sleep", argList: ["2"]) 
print("** s3:") 
print(s3) 


static var allTests = [ 
("testExample", testExample), 


The test output (with some text removed for brevity) is: 


Oo AN OOF WoNnHF DO DOAN OD OF WON KE 
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$ swift test 


Test Suite 


** Sd: 
PID 

3898 
3899 
3999 
4000 
5760 
5761 
8654 


TT STAT 


sQ00 
S000 
$001 
$001 
$002 
$002 
$002 


Ss 
St 


S+ 


CO COE CO CO OS) 


Q: 


‘All tests' 


started at 2021-08-06 16:36:21 .447 


TIME COMMAND 


:QQ. 
:QQ. 
:QQ. 
:QQ. 
:QQ. 
:QQ. 
QO. 


Q1 
18 
Q2 
38 
Q2 
14 
6 


login -pf markw8 

-zsh 

login -pfl markw8 /bin/bash -c exec -la zsh /bin/zsh 

-zsh 

login -pfl markw8 /bin/bash -c exec -la zsh /bin/zsh 

-zsh 
/Applications/Xcode.app/Contents/Developer/Toolchains/Xco\ 


deDe fault.xctoolchain/usr/bin/swi ft-test 
8665 sQ@2 S 


8666 sQQ@2 R 


*K S2: 


Package. swift 
README .md 


Sources 


Tests 


**K SB: 


@:00.03 /Applications/Xcode.app/Contents/Developer/usr/bin/xctest\ 
/Users/markw_1/GIT_swift_book/Shel1Process_swift/.build/arm64-apple-macosx/debug/Sh\ 

el1lProcess_swi ftPackageTests.xctest 

@:00.00 /bin/ps a 


Test Suite 'All tests' passed at 2021-08-06 16:36:23.468. 
Executed 1 test, with @ failures (@ unexpected) in 2.019 (2.021) seconds 


FilelO Examples 


This file I/O example uses the ShellProcess_swift library we saw in the last section so if you were to 
create your own Swift project with the following code listing, you would have to add this dependency 
in the Project.swift file. 


When writing command line Swift programs you will often need to do simple file IO so let’s look at 


some examples here: 


Oo AN ODO naF WNnHRF DO DOAN TOD OF WON KE 
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import Foundation 
import ShellProcess_swift // my library 


@available(OSX 10.13, *) 
func test_files_demo() -> Void { 
// In order to append to an existing file, you need to get a file handle 
// and seek to the end of a file. The following will not work: 
let s = "the dog chased the cat\n" 
try! s.write(toFile: "out.txt", atomically: true, 
encoding: String.Encoding.ascii) 
let s2 = "a second string\n" 
try! s2.write(toFile: "“out.txt", atomically: true, 
encoding: String.Encoding. ascii) 
let aString = try! String(contentsOfFile: "out.txt") 
print(aString) 


// For simple use cases, simply appending strings, then writing 
// the result atomically works fine: 
var s3 = "the dog chased the cat\n" 
s3 += "a second string\n" 
try! s3.write(toFile: "“out2.txt", atomically: true, 
encoding: String.Encoding. ascii) 
let aString2 = try! String(contentsOfFile: "out2.txt") 
print(aString2) 


// list files in current directory: 
let ls = run_in_shell(commandPath: "/bin/ls", argList: ["."]) 
print(l1s) 


// remove two temnporary files: 
let shellOutput = run_in_shell(commandPath: "/bin/rm", 

argList: ["out.txt", "out2.txt"]) 
print(shellOutput) 


if #available(OSX 10.13, *) { 
test_files_demo() 


I created a temporary Swift project with the previous code listing and a Project.swift file. I built 
and ran this example using the swift command line tool. 


Unlike the example in the last section where we built a reusable library with a test program, here 
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we have a standalone program contained in a single file so we will use swift run to build and run 
this example: 


$ swift run 

Fetching git@github.com:mark-watson/ShellProcess_swift.git from cache 
Cloning git@github.com:mark-watson/ShellProcess_swift.git 

Resolving git@github.com:mark-watson/ShellProcess_swift.git at main 
[5/5] Build complete! 

a second string 


the dog chased the cat 
a second string 


Package.resolved 
Package. swift 
README .md 
Sources 

out. txt 

out2. txt 


Swift REPL 


There is an example of using the Swift REPL at the end of the next chapter on web scraping. For 
reference, you can start a REPL with: 


$ swift run --repl 

Type :help for assistance. 

1> import WebScraping_swift 

2> webPageText(uri: "https://markwatson.com" ) 

$R@: String = "Mark Watson: AI Practitioner and Polyglot Programmer"... 
3> public func foo(s: String) -> String { return s } 

4> foo(s: "cat") 

$R1: String = "cat" 

5 


You can import packages and interactively enter Swift expressions, including defining functions. 
In the next chapter we will look at a longer example that scrapes web sites. 


In the next chapter we will look at one more simple example, building a web scraping library, before 
getting to the machine learning and NLP part of the book. 


Pwo ND & 
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Web Scraping 


It is important to respect the property rights of web site owners and abide by their terms and 
conditions for use. This Wikipedia article on Fair Use” provides a good overview of using copyright 
material. 


The web scraping code we develop here uses the Swift library SwiftSoup that is loosely based on 
the BeautifulSoup libraries available in other programming languages. 


For my work and research, I have been most interested in using web scraping to collect text data for 
natural language processing but other common applications include writing AI news collection and 
summarization assistants, trying to predict stock prices based on comments in social media which 
is what we did at Webmind Corporation in 2000 and 2001, etc. 


I wrote a simple web scraping library that is available at https://github.com/mark-watson/Web- 
Scraping _swift** that you can use in your projects by putting the following dependency in your 
Project.swift file: 


dependencies: [ 
.package(url: "git@github.com:mark-watson/WebScraping_swift.git", 
.branch("main")), 


], 
Here is the main implementation file for the library: 


import Foundation 


import SwiftSoup 


public func webPageText(uri: String) -> String { 

guard let myURL = URL(string: uri) else { 
print("Error: \(uri) doesn't seem to be a valid URL") 
fatalError("invalid URI") 

} 

let html = try! String(contentsOf: myURL, encoding: .ascii) 

let doc: Document = try! SwiftSoup.parse(htm1 ) 

let plain_text = try! doc.text() 


return plain_text 





*Shttps://en.wikipedia.org/wiki/Fair_use 
*4https://github.com/mark-watson/WebScraping_swift 


39 





I 
Oo AN OOF WN KF OD 


awawk oo Uo 
NY oowrF WN KF 


Web Scraping 


func webPageHeadersHelper(uri: String, headerName: String) -> [String] { 


var ret: [String] = [] 
guard let myURL = URL(string: uri) else { 
print("Error: \(uri) doesn't seem to be a valid URL") 
fatalError("invalid URI") 
} 
do { 
let html = try String(contentsOf: myURL, encoding: 
let doc: Document = try SwiftSoup.parse(html ) 
let hi_headers = try doc.select(headerName ) 
for el in h1_headers { 
let hi = try el.text() 
ret.append(h1) 
} 
} catch { 
print("Error") 
} 
return ret 
} 
public func webPageH1iHeaders(uri: String) -> [String] { 


return webPageHeadersHelper(uri: uri, headerName: "hi") 


public func webPageH2Headers(uri: String) -> [String] { 


return webPageHeadersHelper(uri: uri, headerName: "h2") 


public func webPageAnchors(uri: String) -> [[String]] { 
var ret: [[String]] = [] 
guard let myURL = URL(string: uri) else { 


print("Error: \(uri) doesn't seem to be a valid URL") 


fatalError("invalid URI") 


} 
do { 


let html = try String(contentsOf: myURL, encoding: 


let doc: Document = try SwiftSoup.parse(html1 ) 
let anchors = try doc.select("a") 
for a in anchors { 
let text = try a.text() 
let a_uri = try a.attr("href") 
if a_uri.hasPrefix("#") { 
ret.append([text, uri + a_uri]) 
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} else { 
ret.append([text, a_uri]) 


} 
} catch { 
print("Error") 
} 


return ret 


Here I wrote utility functions to get the plain text from a web site, HTML header text, and anchors. 
You can clone this library and extend it for other types of HTML elements you may need to process. 


The test program shows how to call the APIs in the library: 


import XCTest 
import Foundation 
import SwiftSoup 


@testable import WebScraping_swift 


final class WebScrapingTests: XCTestCase { 
func testGetWebPage() { 
let text = webPageText(uri: "https: //markwatson.com" ) 
print("\n\n\tTEXT FROM MARK's WEB SITE:\n\n", text) 


func testToShowSwi ftSoupExamples() { 
let myURLString = "https: //markwatson.com" 
let hi_headers = webPageHiHeaders(uri: myURLString) 
print("\n\n++ h1_headers:", h1_headers) 
let h2_headers = webPageH2Headers(uri: myURLString) 
print("\n\n++ h2_headers:", h2_headers) 
let anchors = webPageAnchors(uri: myURLString) 


print("\n\n++ anchors:", anchors) 
} 
static var allTests = [("testGetWebPage", testGetWebPage), 
("testToShowSwi ftSoupExamples", 
testToShowSwi ftSoupExamp1les) ] 
} 


Here we run the unit tests (with much of the output not shown for brevity): 
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$ swift test 
TEXT FROM MARK's WEB SITE: 


Mark Watson: AI Practitioner and Polyglot Programmer | Mark Watson Read my Blog \ 

Fun stuff My Books My Open Source Projects Hire Me Free Mentoring \ 
Privacy Policy Mark Watson: AI Practitioner and Polyglot Programmer I am the author \ 
of 20+ books on Artificial Intelligence, Common Lisp, Deep Learning, Haskell, Clojur\ 
e, Java, Ruby, Hy language, and the Semantic Web. I have 55 US Patents. My customer \ 
list includes: Google, Capital One, Olive AI, CompassLabs, Disney, SAIC, Americast, \ 
PacBell, CastTV, Lutris Technology, Arctan Group, Sitescout.com, Embed.ly, and Webmi\ 
nd Corporation. 


++ ht_headers: ["Mark Watson: AI Practitioner and Polyglot Programmer", "The books t\ 
hat I have written", "Fun stuff", "Open Source", "Hire Me", "Free Mentoring", "Priva\ 
ey Policy" | 


++ h2_headers: ["I am the author of 2@+ books on Artificial Intelligence, Common Lis\ 
p, Deep Learning, Haskell, Clojure, Java, Ruby, Hy language, and the Semantic Web. I\ 
have 55 US Patents.", "Other published books:"] 


++ anchors: [["Read my Blog", "https://mark-watson.blogspot.com"], ["Fun stuff", "ht\ 
tps://markwatson.com#fun"], ["My Books", "https://markwatson.com#books"], ["My Open \ 
Source Projects", "https://markwatson.com#opensource"], ["Hire Me", "https://markwat\ 
son.com#consulting"], ["Free Mentoring", "https://markwatson.com#mentoring"], ["Priv\ 
acy Policy", "https://markwatson.com/privacy.html"], ["leanpub", "https://leanpub.co\ 
m/u/markwatson"], ["GitHub", "https://github.com/mark-watson"], ["LinkedIn", "https: \ 
//www.linkedin.com/in/marklwatson/"], ["Twitter", "https://twitter.com/mark_l_watson\ 
"], ["leanpub", "https://leanpub.com/lovinglisp"], ["leanpub", "https://leanpub.com/\ 
haskell-cookbook/"], ["leanpub", "https://leanpub.com/javaai"], 
] 
Test Suite 'All tests' passed at 2021-08-06 17:37:11 .062. 

Executed 2 tests, with @ failures (@ unexpected) in 2.471 (@.472) seconds 


Running in the Swift REPL 
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$ swift run --repl 
[1/1] Build complete! 
Launching Swift REPL with arguments: -I/Users/markw_1/GIT_swi ft_book/WebScraping_swi \ 
ft/.build/arm64-apple-macosx/debug -L/Users/markw_1/GIT_swi ft_book/WebScraping_swi ft \ 
/.build/arm64-apple-macosx/debug -1WebScraping_swi ft__REPL 
Welcome to Apple Swift version 5.5 (swiftlang-1300.0.29.102 clang-1300.0.28.1). 
Type :help for assistance. 

1> import WebScraping_swi ft 

2> webPageText(uri: "https://markwatson.com" ) 
$RO@: String = "Mark Watson: AI Practitioner and Polyglot Programmer | Mark Watson \ 
Read my Blog Fun stuff My Books My Open Source Projects Privacy Policy \ 
Mark Watson: AI Practitioner and Polyglot Programmer I am the author of 20+ books on\ 
Artificial Intelligence, Common Lisp, Deep Learning, Haskell, Clojure, Java, Ruby, \ 
Hy language, and the Semantic Web. I have 55 US Patents. My customer list includes: \ 
Google, Capital One, Babylist, Olive AI, CompassLabs, Disney, SAIC, Americast, PacBe\ 
ll, CastTV, Lutris Technology, Arctan Group, Sitescout.com, Embed.ly, and Webmind Co\ 
rporation"... 

3> 


This chapter finishes a quick introduction to using Swift and Swift packages for command line 
utilities. The remainder of this book comprises machine learning, natural language processing, and 
semantic web/linked data examples. 


Part 2: Apple’s CoreML and NLP 
Libraries 


In this part we cover: 


¢ Short introduction to the ideas behind Deep Learning 
¢ Introduction of CoreML 

+ Examples using CoreML 

* Introduction of NLP 

+ Examples using NLP libraries 


Deep Learning Introduction 


Apple’s work in smoothly integrating deep learning into their developer tools for macOS, iOS, and 
iPadOS applications is in my opinion nothing short of brilliant. We will finish this book with an 
application that uses two deep learning models that provide almost all of the functionality of the 
application. 


Before diving into Apple’s CoreML libraries in later chapters we will take a shallow dive into the 
principles of deep learning and take a lay-of-the-land look at the type of most commonly used 
models. This chapter has no example programs and is intended as background material. 


Most of my professional career since 2014 has involved Deep Learning, mostly with TensorFlow 
using the Keras APIs. In the late 1980s I was on a DARPA neural network technology advisory 
panel for a year, I wrote the first prototype of the SAIC ANSim neural network library commercial 
product, and I wrote the neural network prediction code for a bomb detector my company designed 
and built for the FAA for deployment in airports. More recently I have used GAN (generative 
adversarial networks) models for synthesizing numeric spreadsheet data and LSTM (long short term 
memory) models to synthesize highly structured text data like nested JSON and for NLP (natural 
language processing). I have also written a product recommendation model for an online store using 
TensorFlow Recommenders. I have several USA and European patents using neural network and 
Deep Learning technology. 


Here we will learn a vocabulary for discussing Deep Learning neural network models and look at 
possible architectures. 


If you want to use Deep Learning professionally, there are two specific online resources that I 
recommend: Andrew Ng leads the efforts at deeplearning.ai*” and Jeremy Howard leads the efforts 
at fast.ai’®. 


There are many Deep Learning neural architectures in current practical use; a few types that I use 
are: 


+ Multi-layer perceptron networks with many fully connected layers. An input layer contains 
placeholders for input data. Each element in the input layer is connected by a two-dimensional 
weight matrix to each element in the first hidden layer. We can use any number of fully 
connected hidden layers, with the last hidden layer connected to an output layer. 

¢ Convolutional networks for image processing and text classification. Convolutions, or filters, 
are small windows that can process input images (filters are two-dimensional) or sequences like 
text (filters are one-dimensional). Each filter uses a single set of learned weights independent 
of where the filter is applied in an input image or input sequence. 





*Shttps://www.deeplearning.ai/ 
**https://www.fast.ai/ 
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+ Autoencoders have the same number of input layer and output layer elements with one or 
more hidden fully connected layers. Autoencoders are trained to produce the same output as 
training input values using a relatively small number of hidden layer elements. Autoencoders 
are capable of removing noise in input data. 

LSTM (long short term memory) process elements in a sequence in order and are capable of 
remembering patterns that they have seen earlier in the sequence. 

GAN (generative adversarial networks) models comprise two different and competing neural 
models, the generator and the discriminator. GANs are often trained on input images (although 
in my work I have applied GANs to two-dimensional numeric spreadsheet data). The generator 
model takes as input a “latent input vector” (this is just a vector of specific size with random 
values) and generates a random output image. The weights of the generator model are trained 
to produce random images that are similar to how training images look. The discriminator 
model is trained to recognize if an arbitrary output image is original training data or an image 
created by the generator model. The generator and discriminator models are trained together. 


The core functionality of libraries like TensorFlow are written in C++ and take advantage of special 
hardware like GPUs, custom ASICs, and devices like Google’s TPUs. Most people who work with 
Deep Learning models don’t need to even be aware of the low level optimizations used to make 
training and using Deep Learning models more efficient. That said, in the following section I am 
going to show you how simple neural networks are trained and used. 


Simple Multi-layer Perceptron Neural Networks 


use the terms Multi-layer perceptron neural networks, backpropagation neural networks and delta- 
rule networks interchangeably. Backpropagation refers to the model training process of calculating 
the output errors when training inputs are passed in the forward direction from input layer, to 
hidden layers, and then to the output layer. There will be an error which is the difference between 
the calculated outputs and the training outputs. This error can be used to adjust the weights from the 
last hidden layer to the output layer to reduce the error. The error is then backprogated backwards 
through the hidden layers, updating all weights in the model. I have detailed example code in several 
of my older artificial intelligence books. Here I am satisfied to give you an intuition of how simple 
neural networks are trained. 


The basic idea is that we start with a network initialized with random weights and for each training 
case we propagate the inputs through the network towards the output neurons, calculate the output 
errors, and back-up the errors from the output neurons back towards the input neurons in order to 
make small changes to the weights to lower the error for the current training example. We repeat 
this process by cycling through the training examples many times. 


The following figure shows a simple backpropagation network with one hidden layer. Neurons in 
adjacent layers are connected by floating point connection strength weights. These weights start 
out as small random values that change as the network is trained. Weights are represented in the 
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following figure by arrows; in the code the weights connecting the input to the output neurons are 
represented as a two-dimensional array. 


Input neuron layer 


Input | Input 
1 2 


Output 
2 


Output 
neuron layer 





Example Backpropagation network with One Hidden Layer 


Each non-input neuron has an activation value that is calculated from the activation values of 
connected neurons feeding into it, gated (adjusted) by the connection weights. For example, in 
the above figure, the value of Output 1 neuron is calculated by summing the activation of Input 
1 times weight W1,1 and Input 2 activation times weight W2,1 and applying a “squashing function” 
like Sigmoid or Relu (see figures below) to this sum to get the final value for Output 1’s activation 
value. We want to flatten activation values to a relatively small range but still maintain relative 
values. To do this flattening we use the Sigmoid function that is seen in the next figure, along with 
the derivative of the Sigmoid function which we will use in the code for training a network by 
adjusting the weights. 


Pa Sigmoid 


si 


SigmoidP 


5 0.0 ei 


Sigmoid Function and Derivative of Sigmoid Function (SigmoidP) 
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Simple neural network architectures with just one or two hidden layers are easy to train using 
backpropagation and I have from scratch code (using no libraries) for this several of my previous 
books. However, here we are using Hy to write models using the TensorFlow framework which 
has the huge advantage that small models you experiment with on your laptop can be scaled to 
more parameters (usually this means more neurons in hidden layers which increases the number of 
weights in a model) and run in the cloud using multiple GPUs. 


Except for pedantic purposes, I now never write neural network code from scratch. I take instead 
advantage of the many person-years of engineering work put into the development of frameworks 
like TensorFlow, PyTorch, mxnet, etc. We now move on to two examples built with TensorFlow. 


Deep Learning 


Deep Learning models are generally understood to have many more hidden layers than simple multi- 
layer perceptron neural networks and often comprise multiple simple models combined together in 
series or in parallel. Complex architectures can be iteratively developed by manually adjusting the 
size of model components, changing the components, etc. Alternatively, model architecture search 
can be automated. At Capital One I used Google’s AdaNet project” that efficiently searches for 
effective model architectures inside a single TensorFlow session. Now all major cloud compute 
provides support some form of AutoML. You need to make a decision for yourself how much effort 
you want to put into deeply understanding the technology, or simply learning how to use pre-trained 
models. 





*7https://github.com/tensorflow/adanet 


Using Apple’s Core ML Machine 
Learning and Deep Learning Libraries 


Please note that this chapter is specific to Apple’s libraries using pre-trained deep learning models. 
I assume that you are generally familiar with Apple’s CoreML documentation” 


There are two example GitHub repositories for this chapter: 


+ https://github.com/mark-watson/create_deep_learning_model_swift”’ generates a deep learn- 
ing model and saves it for reuse. 

¢ https://github.com/mark-watson/swift-coreml-wisconsin_data_predict_with_model” uses the 
trained model. 


In the last chapter we will use two deep learning models in a MacOS application that is available on 
Apple’s App Store. 


If you have taken a class in Machine Learning or Deep Learning, you learned how to divide a training 
data set into separate training, development (often refered to as “dev” sets), and test data sets. This 
process is handled internally by the CoreML libraries we use here so we will only be using a single 
training data file. The CoreML APIs we use here perform a type of AutoML (automatic machine 
learning) by trying to train a model using several model types and choosing the model type with the 
best accuracy. This is convenient and saves engineering time. A trained model imported into XCode 
automatically generates Swift APIs for using the model. You can also take a trained CoreML model 
and use it in Python programs (documentation for Python use cases””). 


Training a Classification Model For the University of 
Wisconsin Cancer Data 


When building the example model (data in files wisconsin.mlmodel"*), a Swift file wisconsin.swift 
is auto-generated. In the project Makefile, notice that the make target clean removes these files: 





*8https://developer.apple.com/documentation/coreml 
*°https://github.com/mark-watson/create_deep_learning_model_swift 
*°https://github.com/mark-watson/swift-coreml-wisconsin_data_predict_with_model 
**https://coremltools.readme.io/docs/mlmodel 
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build_model: clean 
swift build 


swift run 


clean: 
rm -f Sources/wisconsin_data/wisconsin.mlmodel* 


rm -f Sources/wisconsin_data/wisconsin. swift 


The file Sources/wisconsin_data/main.swift reads a training file in CSV format and uses the 
CoreML libraries to train a prediction model. You might want to uncomment the print statement in 
line 10 to see the contents of the CSV formatted (i.e., a spreadsheet file) training data file. In lines 
11-13 we define which columns in the input training CSV file that we will use to build our model 
(in this case we use all the data features). 


In this example we use Apple’s APIs for MLClassifier that trains the following learning algorithms 
and keeps the best for the saved model: 


¢ Boosted trees classifier 
- Random forest classifier 
¢ Decision tree classifier 
- SVM 

¢ Logistic regression 


There is optional material at the end of this chapter with background for these five types of models. 


import Foundation 
import CoreML 
import CreateML 


fune create_model() { 
if #available(macOS 10.14, *) { 
let fileUrl = URL(fileURLWithPath: "labeled_cancer_data.csv") 


print(fileUr1) 

if let dataTable = try? MLDataTable(contentsOf: fileUrl) { 
//print(dataTable) 
let regressorColumns = ["Cl.thickness", "Cell.size", 


"Cell.shape", "Marg.adhesion", 


"Epith.c.size", "Bare.nuclei", 
"Bl.cromatin", "Normal.nucleoli", 
"Mitoses", "Class"] 


// Classifier: 
let classifierTable = dataTable[regressorColumns ] 
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let (classifierEvaluationTable, classifierTrainingTable) = 


classifierTable.randomSplit(by: 9.20, seed: 5) 


let classifier = try! MLClassifier(trainingData: classifierTrainingTable, 


targetColumn: "Class") 


print("++ classifier.description:", classifier) 


/// Classifier training accuracy as a percentage 


let trainingError = classifier.trainingMetrics.classificationError 


let trainingAccuracy = (1.@ - trainingError) * 100 


print("trainingAccuracy:", trainingAccuracy ) 


/// Classifier validation accuracy as a percentage 


let validationError = classifier.validationMetrics.classificationError 


print("validationError:", validationError ) 


let validationAccuracy = (1.2 - validationError) * 122 


print("validationAccuracy:", validationAccuracy ) 
/// Evaluate the classifier 
let classifierEvaluation = 


classifier.evaluation(on: classifierEvaluationTable) 


/// Classifier evaluation accuracy as a percentage 


let evaluationError = classifierEvaluation.classificationError 


print("evaluationError:", evaluationError ) 


let evaluationAccuracy = (1.2 - evaluationError) * 1202 


print("evaluationAccuracy:", evaluationAccuracy ) 


let classifierMetadata = 


MLModelMetadata(author: "Mark Watson", 
shortDescription: "Wisconsin Cancer Dataset", 


version: "1.0") 


/// Save the trained classifier model to the Desktop. 
let = 


create_model () 


try? classifier.write(to: URL(fileURLWithPath: 


"Sources/wisconsin_data/wisconsin.mlmodel \ 


metadata: classifierMetadata) 
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$ make 


rm -f Sources/wisconsin_data/wisconsin.mlmodel* 


rm -f Sources/wisconsin_data/wisconsin. swift 
swift build 
[@/@] Build complete! 


swift run 


[@/@] Build complete! 
column_type_hints = {} 


Finished parsing file /Users/markw_1/GITHUB/wisconsin_data_create_model/labeled_canc\ 


er_data.csv 


Parsing completed. Parsed 100 lines in 0.01006 secs. 


Finished parsing file /Users/markw_1/GITHUB/wisconsin_data_create_model/labeled_canc\ 


er_data.csv 


Parsing completed. Parsed 683 lines in 0.003458 secs. 


Using 9 features to train a model to predict Class. 


Automatically generating validation set from 5% of the data. 


Boosted trees classifier: 


Number 
Number 
Number 
Number 


| Iteration | Elapsed Time 
Loss | Validation Log Loss 


of examples 
of classes 
of feature columns 

of unpacked features 


+----------- +-------------- +------------------- +--------------------- +-------------- \ 

----- +---------------------+ 

| 4 | 2.006108 | 2.988506 | @.904762 | @.459892 \ 
| 2.520190 | 

| 2 | @.010718 | @.984674 | @.857143 | @.329561 \ 
| 0.412062 | 

| 3 | @.015658 | @.984674 | @.857143 | @.245602 \ 
| 0.337748 | 

| 4 | 2.020130 | 2.986590 | 0.857143 | @.186529 \ 
| @.291379 | 

| 5 | @.0@24706 | @.990424 | @.857143 | 0.144306 \ 
| @.262312 | 

| 10 | @.043619 | 2.996169 | @.904762 | @.049835 \ 
| 0.180445 | 

fe Sceee ce siec Possoseeesssses feseccoceesSecoseeec Piss fsssecocassass cases Passstocssoleas \ 

----- +---------------------+ 
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Random forest classifier: 


30 


Number of examples : 522 

Number of classes oe 

Number of feature columns 9 

Number of unpacked features 9 

Hess oieekoas Posxeaselexseisies Purses seeeeseeisocse Hooch ceeeebeseeesacead Peseaceceecdnad \ 
soe uid Pole ce Seer ee eee 

| Iteration | Elapsed Time | Training Accuracy | Validation Accuracy | Training Log \ 
Loss | Validation Log Loss | 


Posse ee eseu Posteo wees Pees soeeeoeS SSS PodeLeseuseuusuesesoes PevaiwwCesewseus ‘ 

shGhe Posse Goss ete ea SS 

fe | @.00@2102 | @.984674 | @.904762 Q@.173523 \ 
| @.305533 | 

| 2 | @.003890 | @.986590 | 2.904762 @.171982 \ 
| @.306030 | 

| 3 | @.@05461 | @.984674 | 2.904762 @.173111 \ 
| 0.276622 

| 4 | @.006758 | @.984674 | 2.904762 Q.171693 x 
| @.285118 | 

| 5 | 2.007484 | @.982759 | 0.952384 @.172563 » 
| @.273630 | 

| 12 | @.011962 | @.984674 | @.952381 @.171195 \ 
| @.261683 | 

$oescssatoss Peeeaeeet ese es Prssetoeteesaceeeios $iecedeecche she eadeses $ececeedeensi as \ 

jS25e toecleenetd Seecclhosces# 

Decision tree classifier: 

Number of examples 922 

Number of classes 2 

Number of feature columns 9 

Number of unpacked features 9 

Peewee sles Peace disease sou Sa ae eee Phe cle Sie ewe sees soe PUL cose Soke \ 

eZee bectecueeeeereebeoeoe oH 

| Iteration | Elapsed Time | Training Accuracy | Validation Accuracy Training Log \ 

Loss | Validation Log Loss | 


+----------- +-------------- +------------------- +--------------------- +-------------- \ 

----- +---------------------+ 

| 4 | 0.002216 | 2.988506 | @.904762 @.170105 \ 
| @.352356 | 

+----------- +-------------- +------------------- +--------------------- +-------------- * 

----- +---------------------+ 

SVM: 
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Number 
Number 
Number 
Number 
Number 


examples 522 
classes eZ 
feature columns 1 9 
unpacked features : 9 


coefficients : 10 


Starting L-BFGS 
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Povee ees Pevovewecsd Paces yee Powe ese ee este Poca sc owes lee Hosoeee wees us \ 

sci cis + 

| Iteration | Passes | Step size | Elapsed Time | Training Accuracy | Validation A\ 

ccuracy | 

Pevecteiskess #Pebecueebes Pocedece sted $e cece ceeeee see Pec ceeieedeevice eee Picetoeseeiees \ 

setoetex + 

| @ | 2 | 1.000000 | 2.000629 0.350575 Q.285714 \ 
| 

ie | 6 | 3.000000 | 2.001610 Q.908046 0.857143 ‘ 
| 

[2 | 7 | 3.000000 | 2.002089 Q.840996 Q.809524 x 
| 

| 3 | 12 | 1.053671 | 2.004093 Q.961686 0.952381 \ 
| 

| 4 | 13 | 1.053671 | 2.004610 0.959770 0.904762 \ 
| 

| 9 | 22 | 1.053671 | 2.007046 0.971264 Q.904762 \ 
| 

$ovecicsieet Peele seen bickecicseen Hostectesteekels Poccenehedeeebecense Peeeeked sashes \ 

weebeosae + 

Logistic regression: 

Number of examples : 522 

Number of classes : 2 

Number of feature columns : 9 

Number of unpacked features : 9 

Number of coefficients : 10 

Starting Newton Method 

Povecte chess Poeeeuee ees toceeeece seeded $actes tee eee die ohese Pisetecsicseecdacsoeces + 

| Iteration | Passes | Elapsed Time | Training Accuracy | Validation Accuracy | 

Peeeeaseeiss #eeace sie cele ema an Poe deceiceresie cans ee + 

| 4 | 2 | @.000373 | @.967433 | 2.904762 | 

| -2 | 3 | @.000724 | @.969349 | @.904762 | 

| 3 | 4 | @.001080 | @.975096 | @.904762 | 
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| 4 | 5 | @.001427 | @.978927 | @.904762 | 
| 5 | 6 | 2.001796 | @.978927 | @.904762 | 
| 7 | 8 | @.002388 | @.978927 | @.904762 | 
+----------- +---------- +-------------- +------------------- +--------------------- + 
SUCCESS: Optimal solution found. 


++ classifier.description: RandomForestClassi fier 


Parameters 

Max Depth: 6 

Max Iterations: 10 

Min Loss Reduction: 0.0 
Min Child Weight: 2. 
Random Seed: 42 

Row Subsample: 2.8 
Column Subsample: 2.8 


Performance on Training Data 
Number of examples: 522 
Number of classes: 2 
Accuracy: 98.47% 


Performance on Validation Data 
Number of examples: 21 

Number of classes: 2 

Accuracy: 95.24% 


trainingAccuracy: 98.46743295019157 

validationError: @.04761904761904767 

validationAccuracy: 95.23809523809523 

evaluationError: @.Q50000000000000044 

evaluationAccuracy: 95.0 

Trained model successfully saved at /Users/markw_1/GITHUB/Swi ftAI -book-code/wisconsi\ 
n_data_create_model/Sources/wisconsin_data/wisconsin.mlmodel. 


Using the Classification Model for the University of 
Wisconsin Cancer Data 


The GitHub repo https://github.com/mark-watson/swift-coreml-wisconsin_data_predict_with_model* 
contains a Makefile with a target for building the prediction code: 





*?https://github.com/mark-watson/swift-coreml-wisconsin_data_predict_with_model 
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build_preditor: clean 
cp ../wisconsin_data_create_model/Sources/wisconsin_data/wisconsin.mlmodel \ 
Sources/wisconsin_data/ 
cd Sources/wisconsin_data; \ 
xcrun coremlcompiler generate wisconsin.mlmodel --language Swift . 
cd Sources/wisconsin_data; xcrun coremlcompiler compile wisconsin.mlmodel 
swift build 


swift run 


clean: 
rm -rf Sources/wisconsin_data/wisconsin.mlmodel* 


rm -rf Sources/wisconsin_data/wisconsin. swift 


The file swift-coreml-wisconsin_data_predict_with_model/Sources/wisconsin_data/main.swift 
contains the prediction code: 


import Foundation 
import CoreML 
import CreateML 


func predict() { 
if #available(macOS 10.14, *) { 


let modelUrl = URL(fileURLWithPath: 
"Sources/wisconsin_data/wisconsin.mlmodelc" ) 

let pretrained_model = try! wisconsin(contentsOf: modelUr1, 
configuration: MLModelConfiguration()) 


let sampleInput = wisconsinInput(Cl_thickness: 3, Cell_size: 2, 

Cell_shape: 5, Marg_adhesion: 8, Epith_c_size: 8, Bare_nuclei: 2, Bl_cro\ 

matin: 3, 

Normal_nucleoli: 7, Mitoses: 4) 
let a_prediction = try! pretrained_model.prediction(input: sampleInput) 
print(a_prediction. featureNames ) 
print("Class:", a_prediction. featureValue(for: "Class")!) 
print("ClassProbability:", 

a_prediction. featureValue(for: "ClassProbability")!) 


predict() 


We can run the prediction example on the command line: 
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$ make 
rm -rf Sources/wisconsin_data/wisconsin.mlmodel1* 
rm -rf Sources/wisconsin_data/wisconsin.swift 
cp ../wisconsin_data_create_model/Sources/wisconsin_data/wisconsin.mlmodel \ 
Sources/wisconsin_data/ 
cd Sources/wisconsin_data; \ 
xcrun coremlcompiler generate wisconsin.mlmodel --language Swift . 
/Users/markw_1/GITHUB/wisconsin_data_predict_with_model/Sources/wisconsin_data/wisco\ 
nsin.swift 
cd Sources/wisconsin_data; xcrun coremlcompiler compile wisconsin.mlmodel 
/Users/markw_1/GITHUB/wisconsin_data_predict_with_model/Sources/wisconsin_data/wisco\ 
nsin.mlmodelc/coremldata.bin 
/Users/markw_1/GITHUB/wisconsin_data_predict_with_model/Sources/wisconsin_data/wisco\ 
nsin.mlmodelc/analytics/coremldata.bin 
/Users/markw_1/GITHUB/wisconsin_data_predict_with_model/Sources/wisconsin_data/wisco\ 
nsin.mlmodelc/model9/coremldata.bin 
/Users/markw_1/GITHUB/wisconsin_data_predict_with_model/Sources/wisconsin_data/wisco\ 
nsin.mlmodelc/model1/coremldata.bin 








/Users/markw_1/GITHUB/wisconsin_data_predict_with_model/Sources/wisconsin_data/wisco\ 

nsin.mlmodelc/model1/_BQQ0@.DAT 

swift build 

‘wisconsin_data' /Users/markw_1/GITHUB/wisconsin_data_predict_with_model: warning: f\ 

ound 1 file(s) which are unhandled; explicitly declare them as resources or exclude \ 

from the target 
/Users/markw_1/GITHUB/wisconsin_data_predict_with_model/Sources/wisconsin_data/w\ 

isconsin.mlmodel 


[4/4] Build complete! 
swift run 
‘wisconsin_data' /Users/markw_1/GITHUB/wisconsin_data_predict_with_model: warning: f\ 
ound 1 file(s) which are unhandled; explicitly declare them as resources or exclude \ 
from the target 

/Users/markw_1/GITHUB/wisconsin_data_predict_with_model /Sources/wisconsin_data/w\ 


isconsin.mlmodel 


[@/@] Build complete! 
["Class", "ClassProbability"] 
Class: Int : @ 
ClassProbability: Dictionary : { 
@ = "@.7969631955468645" ; 
1 = "Q@.2030368044531 356" ; 
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I recommend that you read through Apple’s documentation and bookmark the page for the CoreML 
classification modes”’. 


Boosted Trees Classifiers are comprised of individual models summed together, where the simpler 
models are learned decision trees (a type of ensemble models). 


Random Forest Classifiers are similar to Boosted Trees Classifiers except the ensemble sub-classifier 
comprising Random Forest Classifiers are each trained with a subset of the data. 


You might also want to review Apple’s documentation for the following conventional Machine 
Learning algorithms: Decision tree classifier, SVM, and Logistic regression. 





**https://apple.github.io/turicreate/docs/userguide/supervised-learning/classifier.html 


Natural Language Processing Using 
Apple’s Natural Language Framework 


Ihave been working in the field of Natural Language Processing (NLP) since 1985 so I ‘lived through’ 
the revolutionary change in NLP that has occurred since 2014: Deep Learning results out-classed 
results from previous symbolic methods. 


https://developer.apple.com/documentation/naturallanguage 


I will not cover older symbolic methods of NLP here, rather I refer you to my previous books Practical 
Artificial Intelligence Programming With Java**, Loving Common Lisp, or the Savvy Programmer’s 
Secret Weapon*’, and Haskell Tutorial and Cookbook’*® for examples. We get better results using 
Deep Learning (DL) for NLP and the libraries that Apple provides. 


You will learn how to apply both DL and NLP by using the state-of-the-art full-feature libraries that 
Apple provides in their iOS and macOS development tools. 


Using Apple’s NaturalLanguage Swift Library 


We will use one of Apple’s NLP libraries consisting of pre-built models in the last chapter of this book. 
In order to fully understand the example in the last chapter you will need to read Apple’s high-level 
discussion of using CoreML https://developer.apple.com/documentation/coreml*’ and their specific 
support for NLP https://developer.apple.com/documentation/naturallanguage/**. 


There are many pre-trained CoreML compatible models on the web, both from Apple and also from 
third party (e.g., https://github.com/likedan/Awesome-CoreML-Models”’). 


Apple also provides tools for converting TensorFlow and PyTorch models to be compatible with 
CoreML https://coremltools.readme.io/docs*®. 


A simple Wrapper Library for Apple’s NLP Models 


I will not go into too much detail here but I created a small wrapper library for Apple’s NLP 
models that will make it easier for you to jump in and have fun experimenting with them: 
https://github.com/mark-watson/NIp_swift*’. 





**https://leanpub.com/javaai 

**https://leanpub.com/lovinglisp 

**https://leanpub.com/haskell- cookbook 
*"https://developer.apple.com/documentation/coreml 
*8https://developer.apple.com/documentation/naturallanguage/ 
*°https://github.com/likedan/Awesome-CoreML-Models 
“°https://coremltools.readme.io/docs 
“*https://github.com/mark-watson/NIp_swift 
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The main library implementation file is: 


import Foundation 
import NaturalLanguage 


let tagger = NSLinguisticTagger(tagSchemes:[.tokenType, .language, .lexicalClass, 
.nameType, .lemma], options: Q) 

let options: NSLinguisticTagger.Options = [.omitPunctuation, .omitWhitespace, 
. joinNames ] 


@available(OSX 10.13, *) 
public func getEntities(for text: String) -> [(String, String)] { 
var words: [(String, String)] = [] 
tagger.string = text 
let range = NSRange(location: @, length: text.utf16.count) 
tagger .enumerateTags(in: range, unit: .word, scheme: .nameType, 
options: options) { tag, tokenRange, stop in 
let word = (text as NSString).substring(with: tokenRange) 
words.append((word, tag?.rawValue ?? "unkown")) 


} 


return words 


@available(OSX 10.13, *) 
public func getLemmas(for text: String) -> [(String, String)] { 
var words: [(String, String)] = [] 
tagger.string = text 
let range = NSRange(location: @, length: text.utf16.count) 
tagger .enumerateTags(in: range, unit: .word, scheme: .lemma, 
options: options) { tag, tokenRange, stop in 
let word = (text as NSString).substring(with: tokenRange) 
words.append((word, tag?.rawValue ?? "unkown")) 


} 


return words 


Here is some test code: 
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let quote = "President George Bush went to Mexico with IBM representatives. Here's t\ 
o the crazy ones. The misfits. The rebels. The troublemakers. The round pegs in the \ 
square holes. The ones who see things differently. They're not fond of rules. And th\ 
ey have no respect for the status quo. You can quote them, disagree with them, glori\ 
fy or vilify them. About the only thing you can't do is ignore them. Because they ch\ 
ange things. They push the human race forward. And while some may see them as the cr\ 
azy ones, we see genius. Because the people who are crazy enough to think they can c\ 
hange the world, are the ones who do. - Steve Jobs (Founder of Apple Inc.)" 
if #available(OSX 10.13, *) { 

print("\nEntities:\n") 

print(getEntities(for: quote)) 

print("\nLemmas: \n") 

print(getLemmas( for: quote) ) 


Using the OpenAl APIs 


Ihave been working as an artificial intelligence practitioner since 1982 and the capability of the beta 
OpenAI APIs is the most impressive thing that I have seen (so far!) in my career. These APIs use the 
GPT-3 model. You will need to apply to OpenAI for a free API access key. I use their APIs frequently 
enough in my projects that I am on their paid plan. 


I recommend reading the online documentation for the online documentation for the APIs* to see 
all the capabilities of the beta OpenAI APIs. Let’s start by jumping into the example code that 
is a GitHub repository https://github.com/mark-watson/OpenAI_swift** that you can use in your 
projects. 


The library that I wrote for this chapter supports three functions: for completing text, summarizing 
text, and answering general questions. The single OpenAI model that the beta OpenAI APIs use is 
fairly general purpose and can generate cooking directions when given an ingredient list, grammar 
correction, write an advertisement from a product description, generate spreadsheet data from data 
descriptions in English text, etc. 


Given the examples from https://beta.openai.com** and the Clojure examples here, you should be 
able to modify my example code to use any of the functionality that OpenAI documents. 


We will look closely at the function completions and then just look at the small differences to the 
other two example functions. The definitions for all three exported functions are kept in the file 
src/openai_api/core.clj*. You need to request an API key (I had to wait a few weeks to recieve 
my key) and set the value of the environment variable OPENAI_KEY to your key. You can add a 
statement like: 


export OPENAI_KEY=sa-hdedds7&dhdhsdf fd 


to your .profile or other shell resource file that contains your key value (the above key value is 
made-up and invalid). 


While I sometimes use pure Clojure libraries to make HTTP requests, I prefer using the curl utility 
to experiment with API calls from the command line before starting to write any code. 


An example curl command line call to the beta OpenAI APIs is: 





“*https://beta.openai.com/docs/introduction/key- concepts 
“https://github.com/mark-watson/OpenAl_swift 
“4https://beta.openai.com 
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curl \ 
https: //api.openai.com/v1/engines/davinci/completions \ 
-H "Content-Type: application/ json" 
-H "Authorization: Bearer sa-hdffds7&dhdhsdgffd" \ 
-d '{"prompt": "The President went to Congress", \ 
"max_tokens": 22}' 


Here the API token “sa-hdffds7&dhdhsdgffd” on line 4 is made up - that is not my API token. All 
of the OpenAI APIs expect JSON data with query parameters. To use the completion API, we set 
values for prompt and max_tokens. The value of max_tokens is the requested number of returns 
words or tokens. We will look at several examples later. 


In the file Sources/OpenAI_swift/OpenAI_swift.swift we start with a helper function openAi- 
Helper that takes a string with the OpenAI API call arguments then extracts the results from the 
returned JSON data: 


func openAiHelper(body: String) -> String { 
var ret = "" 
var content = "{}" 
let requestUrl = URL(string: openAiHost) ! 
var request = URLRequest(url: requestUr1) 
request. .httpMethod = "POST" 
request .httpBody = body.data(using: String.Encoding.utf8); 
request.setValue("application/ json", forHTTPHeaderField: "Content-Type" ) 
request.setValue("Bearer " + openai_key, forHTTPHeaderField: "Authorization") 
let task = URLSession.shared.dataTask(with: request) { (data, response, error) in 
if let error = error { 
print("-->> Error accessing OpenAI servers: \(error)") 
return 
p 
if let data = data, let s = String(data: data, encoding: .utf8) { 
content = s 
CFRunLoopStop(CFRunLoopGetMain( )) 


} 


task .resume( ) 
CFRunLoopRun( ) 
let c = String(content) 
let it = c.range(of: "\"text\": ") 
if let r1 = ii { 

let i2 = c.range(of: "\"index\":") 

if let r2 = i2 { 

ret = String(String(String(c[r1.lowerBound. .<r2.lowerBound] ) 


28 
29 
30 
31 
32 
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.dropFirst(9)).dropLast(2) ) 


} 


return ret 


I convert JSON data to a string output by searching for constants “text:” and “index:” instead of using 
a JSON parser like I do in the later KGN example. 


The three example functions all use this openAiHelper function. The first example function 
completions sets the parameters to complete a text fragment. You have probably seen examples 
of the OpenAI GPT-3 model writing stories, given a starting sentence. We are using the same model 
and functionality here: 


public func completions(promptText: String, maxTokens: Int = 25) -> String { 
let body: String = "{\"prompt\": \"" + promptText + "\", 
\"max_tokens\": \(maxTokens)" + "}" 


return openAiHelper (body: body) } 


Note that the OpenAI models are stochastic. When generating output words (or tokens), the model 
assigns probabilities to possible words to generate and samples a word using these probabilities. As 
a simple example, suppose given prompt text “it fell and”, then the model could only generate three 
words, with probabilities for each word based on this prompt text: 


¢ the 0.9 
¢ that 0.1 
ea0.l1 


The model would emit the word the 90% of the time, the word that 10% of the time, or the word 
a 10% of the time. As a result, the model can generate different completion text for the same text 
prompt. Let’s look at some examples using the same prompt text. Notice the stochastic nature of 
the returned results with prompt text ““He walked to the river” passed twice to the OpenAI GPT-3 
model: 


First example: 


and sat down thinking, the warm evening clotted with insects. The river lapping the\ 
bank in the long grass. He 


Another example of text completion: 
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, the beast running slowly behind him. He looked away from the cave now, using rain \ 


and clouds as his curtain to hide 


The function summarize is very similar to the function completions except the JSON data passed 
to the API has a few additional parameters that let the API know that we want a text summary: 


* presence_penalty - penalize words found in the original text (we set this to zero) 

+ temperature - higher values the randomness used to select output tokens. If you set this to zero, 
then the same prompt text will always yield the same results (I never use a zero value). 

* top_p - also affects randomness. All examples I have seen use a value of 1. 

+ frequency_penalty - penalize using the same words repeatedly (I usually set this to zero, but 
you should experiment with different values) 


When summarizing text, try varying the number of generated tokens to get shorter or longer 
summaries; in the following examples we ask for 24, 90, and 150 output tokens (lines are broken 
to fit page width): 


public func summarize(text: String, maxTokens: Int = 40) -> String { 
let body: String = "{\"prompt\": \"" + text + "\", 
\"max_tokens\": \(maxTokens), \"presence_penalty\": 0.0, \"temperature\": @.3, 
\"top_p\": 1.0, \"frequency_penalty\": @.0}" 
return openAiHelper(body: body) } 


Notice the stochastic nature of the returned summarization results with prompt text “Jupiter is the 
fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one- 
thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System 
combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has 
been known to ancient civilizations since before recorded history. It is named after the Roman god 
Jupiter.[19] When viewed from Earth, Jupiter can be bright enough for its reflected light to cast 
visible shadows,[20] and is on average the third-brightest natural object in the night sky after the 
Moon and Venus.’: 


First summarization example: 
Jupiter is a gas giant because it is predominantly composed of hydrogen and 
helium; it has a solid core, but it has no surface. Jupiter is a gas giant 


because it is predominantly composed" 


Another summarization example: 
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The planet is usually the fourth-brightest in the night sky, after the Sun, 
Venus and the Moon. 


Jupiter is a gas giant because it is predominantly composed of hydrogen 


The function answerQuestion is very similar to the function summarize except the JSON data 
passed to the API has one additional parameter that let the API know that we want a question 
answered: 


« stop - The OpenAI API examples use the value: [\n], which is what I use here. 


We also need to prepend the string “nQ: “ to the prompt text. 


Additionally, the model returns a series of answers with the string “nQ:” acting as a delimiter 
between the answers. 


public func questionAnsweering(question: String) -> String { 
let body: String = "{\"prompt\": \"nQ: " + question + " nA:\", \"max_tokens\": 2\ 
5, \"presence_penalty\": @.0, \"temperature\": @.3, \"top_p\": 1.0, \"frequency_pena\ 
Ity\": @.@ , \"stop\": [\"\\n\"]}" 
let answer = openAiHelper(body: body) 
if let i1 = answer.range(of: "nQ:") { 
return String(answer [answer .startIndex. .<i1.lowerBound] ) 
//return String(answer.prefix(i1.lowerBound) ) 


} 


return answer} 


I strongly urge you to add a debug printout to the question answering code to print the full answer 
before we check for the delimiter string. For some questions, the OpenAI APIs generate a series of 
answers that increase in generality. In the example code we just take the most specific answer. 


Let’s look at a few question answering examples and we will discuss possible problems and 
workarounds. The first two examples ask the same question and get back different, but reasonable 
answers. The third example asks a general question. The GPT-3 model is trained using a massive 
amount of text from the web which is why it can generate reasonable answers. Here are two 
examples for answering the question “Where was Leonardo da Vinci born?”: 


In Vinci, Italy. 


And another generated output for the same question: 
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In Italy. 


In addition to reading the beta OpenAI API documentation you might want to read general material 
on the use of OpenAl’s GPT-3 model. Since the APIs we are using are beta they may change. I will 
update this chapter and the source code on GitHub if the APIs change. 


Part 3: Knowledge Representation 
and Data Acquisition 


In this part we cover: 


¢ Introduction to the semantic web and linked data 
- A general discussion of Knowledge Representation 
+ Create Knowledge Graphs from text input 

+ Knowledge Graph Explorer application 


Linked Data and the Semantic Web 


Tim Berners Lee, James Hendler, and Ora Lassila wrote in 2001 an article for Scientific American 
where they introduced the term Semantic Web. Here I do not capitalize semantic web and use the 
similar term linked data somewhat interchangeably with semantic web. 


In the same way that the web allows links between related web pages, linked data supports linking 
associated data on the web together. I view linked data as a relatively simple way to specify 
relationships between data sources on the web while the semantic web has a much larger vision: 
the semantic web has the potential to be the entirety of human knowledge represented as data on 
the web in a form that software agents can work with to answer questions, perform research, and 
to infer new data from existing data. 


While the “web” describes information for human readers, the semantic web is meant to provide 
structured data for ingestion by software agents. This distinction will be clear as we compare 
WikiPedia, made for human readers, with DBPedia which uses the info boxes on WikiPedia topics to 
automatically extract RDF data describing WikiPedia topics. Let’s look at the WikiPedia topic for the 
town I live in Sedona, Arizona, and show how the info box on the English version of the WikiPedia 
topic page for Sedona https://en.wikipedia.org/wiki/Sedona,_Arizona** maps to the DBPedia page 
http://dbpedia.org/page/Sedona,_Arizona*®. Please open both of these WikiPedia and DBPedia URIs 
in two browser tabs and keep them open for reference. 


I assume that the format of the WikiPedia page is familiar so let’s look at the DBPedia page for 
Sedona that in human readble form shows the RDF statements with Sedona Arizona as the subject. 
RDF is used to model and represent data. RDF is defined by three values so an instance of an RDF 
statement is called a triple with three parts: 


¢ subject: a URI (also referred to as a “Resource”) 

* property: a URI (also referred to as a “Resource”) 

« value: a URI (also referred to as a “Resource”) or a literal value (like a string or a number with 
optional units) 


The subject for each Sedona related triple is the above URI for the DBPedia human readable page. 
The subject and property references in an RDF triple will almost always be a URI that can ground 
an entity to information on the web. The human readable page for Sedona lists several properties 
and the values of these properties. One of the properties is “dbo:areaCode” where “dbo” is a name 
space reference (in this case for a DatatypeProperty*’). 





“https://en.wikipedia.org/wiki/Sedona,_Arizona 
*http://dbpedia.org/page/Sedona,_Arizona 
“"http://www.w3.org/2002/07/owl#DatatypeProperty 
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The following two figures show an abstract representation of linked data and then a sample of linked 
data with actual web URIs for resources and properties: 


(—) 


property 1 


property 2 


literal value 


Resource 2 


property 3 


literal value 


Abstract RDF representation with 2 Resources, 2 literal values, and 3 Properties 
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<http://markwatson. com/index. rdf#mark_watson> 


<http: //www.w3.org/2000/10/swap/pim/contact#company> 






<http: //www. ontoweb. org/ontology/1#author> 


<http://markwatson. com/index. rdf#Sun_ONE> 





literal value: 


“Capital One” 





<http: //www. ontoweb. org/ontology/1#booktit le> 


literal value: 
“Sun ONE Services - J2EE” 





Concrete example using RDF seen in last chapter showing the RDF representation with 2 Resources, 2 literal values, 
and 3 Properties 


We will use the SPARQL query language (SPARQL for RDF data is similar to SQL for relational 
database queries). Let’s look at an example using the RDF in the last figure: 


"select ?v where { <http://markwatson.com/index.rdf#Sun_ONE> 
<http: //www.ontoweb.org/ontology/1#booktitle> 
2v } 


This query should return the result “Sun ONE Services - J2EE”. If you wanted to query for all URI 
resources that are books with the literal value of their titles, then you can use: 


"select ?s ?v where { ?s 
<http: //www.ontoweb.org/ontology/1#booktitle> 
2v } 


Note that ?s and ?v are arbitrary query variable names, here standing for “subject” and “value”. You 
can use more descriptive variable names like: 


"select ?bookURI ?bookTitle where 
{ ?bookURI 
<http: //www. ontoweb.org/ontology/1#booktitle> 
?bookTitle } 


We will be diving a little deeper into RDF examples in the next chapter when we write a tool for 
using RDF data from DBPedia to find information about entities (e.g., people, places, organizations) 
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and the relationships between entities. For now I want you to understand the idea of RDF statements 
represented as triples, that web URIs represent things, properties, and sometimes values, and that 
URIs can be followed manually (often called “dereferencing”) to see what they reference in human 
readable form. 


Understanding the Resource Description Framework 
(RDF) 


Text data on the web has some structure in the form of HTML elements like headers, page titles, 
anchor links, etc. but this structure is too imprecise for general use by software agents. RDF is a 
method for encoding structured data in a more precise way. 


RDF specifies graph structures and can be serialized for storage or for service calls in XML, 
Turtle, N3, and other formats. I like the Turtle format and suggest that you pause reading this 
book for a few minutes and look at this World Wide Web Consortium Turtle RDF primer at 
https://www.w3.org/2007/02/turtle/primer/**. 


Frequently Used Resource Namespaces 


The following standard namespaces are frequently used: 


¢ RDF https://www.w3.org/TR/rdf-syntax-grammar/* 
¢ RDFS https://www.w3.org/TR/rdf-schema/*° 

¢ OWL http://www.w3.org/2002/07/owl#** 

* XSD http://www.w3.org/2001/XMLSchema#** 

« FOAF http://xmlns.com/foaf/0.1/°° 

¢ SKOS http://www.w3.org/2004/02/skos/core#** 

« DOAP http://usefulinc.com/ns/doap#*° 

« DC http://purl.org/dc/elements/1.1/°° 

« DCTERMS http://purl.org/dc/terms/*” 

* VOID http://rdfs.org/ns/void#** 


Let’s look into the Friend of a Friend (FOAF) namespace. Click on the above link for FOAF 
http://xmlns.com/foaf/0.1/* and find the definitions for the FOAF Core: 


“Shttps://www.w3.org/2007/02/turtle/primer/ 
“https://www.w3.org/TR/rdf-syntax-grammatr/ 
>°https://www.w3.org/TR/rdf- schema/ 
**http://www.w3.org/2002/07/owl# 
**http://www.w3.org/2001/XMLSchema# 
**http://xmlns.com/foaf/0.1/ 
**http://www.w3.org/2004/02/skos/core# 
**http://usefulinc.com/ns/doap# 
**http://purl.org/dce/elements/1.1/ 
°"http://purl.org/dc/terms/ 
**http://rdfs.org/ns/void# 
>°http://xmlns.com/foaf/0.1/ 
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Agent 
Person 
name 
title 


img 


depiction (depicts) 


familyName 
givenName 
knows 
based_near 
age 

made (maker ) 


primaryTopic (primaryTopicOf) 


Project 
Organization 
Group 

member 
Document 
Image 


and for the Social Web: 


mbox 

homepage 

weblog 

openid 

jabber ID 
mbox_shatsum 
interest 
topic_interest 
topic (page) 
workplaceHomepage 
work InfoHomepage 
schoolHomepage 
publications 
currentProject 
pastPro ject 
account 
OnlineAccount 
accountName 
accountServiceHomepage 


PersonalProfileDocument 


tipjar 
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shat 
thumbnail 
logo 


You now have seen a few common Schemas for RDF data. Another Schema that is widely used for 
annotating web sites that we won’t need for our examples here, is schema.org®’. 


Understanding the SPARQL Query Language 


For the purposes of the material in this book, the two sample SPARQL queries here are sufficient for 
you to get started using my SPARQL library https://github.com/mark-watson/SparqlQuery_swift*’ 
with arbitrary RDF data sources and simple queries. 

















eee EI > pig SparqiQuery ) BB My Mac SparqlQuery | Build SparqiQuery: Succeeded | 1/16/21 at 10:00 AM + ©€ 























Be EHERQACEKDOE & Sparq/Query.swift a0 W 

















¥ El SparaiQuery () Sparqiquery ) 9 Sources ) MB Sparqiquery ) s SparqiQuery.swift ) EJ SparqlEndpointHelpter (query:endPointUri:) 


README.md import Foundation 
@) Package.swift 


as import SwiftyJSON 
~~ ources 


v & SparqiQuery public func sparqlDbPedia(query: String) -> Array<Dictionary<String,String>> { 
= SparqlQuery.swift return SparqlEndpointHelpter(query: query, endPointUri: 
v © Tests "https: //dbpedia.org/sparql?query=") } 


> & SparqiQueryTests 


Package.resolved public func sparqlWikidata(query: String) -> Array<Dictionary<String,String>> { 


return SparglEndpointHelpter(query: query, endPointUri: 
Swift Package Dependencies "https: //query.wikidata.org/bigdata/namespace/wdq/sparql?query=") } 
> [id SwiftyJSON master 
func SparqlEndpointHelpter(query: String, endPointUri: String) -> 
Array<Dictionary<String,String>> { 
var ret = Array<Dictionary<String,String>>(); 
let requestUrl = URL(string: String(endPointUri + 
+ ©8 query .addingPercentEncoding(withAllowedCharacters: .urlHostAllowed)!) + 








My Swift SPARQL library open in Xcode 


The Apache Foundation has a good introduction to SPARQL® that I refer you to for more 
information. 


Semantic Web and Linked Data Wrap Up 


In the next chapter we will use natural language processing to extract structured information from 
raw text from SPARQL queries. We will be using my Swift SPARQL library https://github.com/mark- 
watson/SparqlQuery_swift®’ as well as two pre-trained CoreML deep learning models. 





“°https://schema.org 
**https://github.com/mark-watson/SparqlQuery_swift 
**https://jena.apache.org/tutorials/sparql.html 
https://github.com/mark-watson/SparqlQuery_swift 


Example Application: iOS and macOS 
Versions of my 
KnowledgeBookNavigator 


I used many of the techniques discussed in this book, the Swift language, and the SwiftUI user 
interface framework to develop Swift version of my Knowledge Graph Navigator application for 
macOS. I originally wrote this as an example program in Common Lisp for another book project. 


The GitHub repository for the KGN example is https://github.com/mark-watson/KGN. I copied the 
code from my stand-alone Swift libraries to this example to make it self contained. The easiest way 
to browse the source code is to open this project in Xcode. 


I submitted the KGN app that we discuss in this chapter to Apple’s store and is available as a macOS 
app. If you load this project into Xcode, you can also build and run the iOS and iPadOS targets. 


You will need to have read through the last chapter on semantic web and linked data technologies 
to understand this example because quite a lot of the code has embedded SPARQL queries to get 
information from DBPedia.org®’. 


The other major part of this app is a slightly modified version of Apple’s question answering (QA) 
example using the BERT model in CoreML. Apple’s code is in the subdirectory AppleBERT. Please 
read the README file for this project and follow the directions for downloading and using Apple’s 
model and vocabulary file. 


Screen Shots of macOS Application 


In the first screenshot seen below, I had entered query text that included “Steve Jobs” and the popup 
list selector is used to let the user select which “Steve Jobs” entity from DBPedia that they want to 
use. 





**https://github.com/mark-watson/KGN 
*https://dbpedia.org 
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Steven Paul Jobs (/d3obz/: 

an American business magnate, industrial designer, investor, 

media proprietor. He was the chairman, chief executive officer 

and co-founder of Apple Inc.; the chairman and majority shareholder 


Pixar; a member of The Walt Disney Company's board of directors 
its acquisition of Pixar; and the founder, chairman, and CEO 

NeXT. Jobs is widely recognized as a pioneer of the personal 
revolution of the 1970s and 1980s, along with his early business 
FTaleMi-lifeyume\e) 9)(-Mevenacel are(-1mbs) C.-M) (ey 4al tL a 





























Steve Jobs is the authorized self-titled biography of American 
magnate and Apple co-founder Steve Jobs. The book was written 
the request of Jobs by Walter Isaacson, a former executive at 

and TIME who has written best-selling biographies of Benjamin 
and Albert Einstein. The book was released on October 24, 2011, 
Simon & Schuster in the United States, 19 days after Jobs's death. 
film adaptation written by Aaron Sorkin and directed by Danny 
with Michael Fassbender starring in the title role, was released 
October 9, 2015. 





Microsoft 
William Her Steve Jobs is a 2015 biographical drama film directed by Danny 

Seattle (and written by Aaron Sorkin. A British-American co-production, 

was adapted from the 2011 biography by Walter Isaacson and interviews 

by Sorkin, and covers 14 years (1984-1998) in the life of Apple 
co-founder Steve Jobs. Jobs is portrayed by Michael Fassbender, 
Kate Winslet as Joanna Hoffman and Seth Rogen, Katherine Waterston, 
Stuhlbarg, and Jeff Daniels in supporting roles. 





For 


71 Words 


Entered query and KGN is asking user to disambiguate which “Steve Jobs” they want information for 
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eee KGN 


Query Info About 


Knowledge Graph Navigator 


Bill Gates and Melinda Gates and Steve Jobs visited Microsoft in Seattle) 


RELATIONSHIPS: 


Bill Gates knownFor Microsoft 
Microsoft foundedBy Bill Gates 
Bill Gates birthPlace Seattle 
Microsoft founders Bill Gates 


ENTITY DETAILS: 

Microsoft Corporation is an American multinational technology corporation which produces computer software, consumer electronics, personal computers, and related services. Its best known software 
products are the Microsoft Windows line of operating systems, the Microsoft Office suite, and the Internet Explorer and Edge web browsers. Its flagship hardware products are the Xbox video game 
consoles and the Microsoft Surface lineup of touchscreen personal computers. Microsoft ranked No. 21 in the 2020 Fortune 500 rankings of the largest United States corporations by total revenue; it was 
the world's largest software maker by revenue as of 2016. It is considered one of the Big Five companies in the U.S. information technology industry, along with Amazon, Google (Alphabet), Apple, and 
Facebook ( 


William Henry Gates III (born October 28, 1955) is an American business magnate, software developer, investor, author, and philanthropist. He is a co-founder of Microsoft, along with his late childhood 
friend Paul Allen. During his career at Microsoft, Gates held the positions of chairman, chief executive officer (CEO), president and chief software architect, while also being the largest individual 
shareholder until May 2014. He is considered one of the best known entrepreneurs of the microcomputer revolution of the 1970s and 1980s. 


Steven Paul Jobs (/d3nbz/; February 24, 1955 — October 5, 2011) was an American business magnate, industrial designer, investor, and media proprietor. He was the chairman, chief executive officer 
(CEO), and co-founder of Apple Inc.; the chairman and majority shareholder of Pixar; a member of The Walt Disney Company's board of directors following its acquisition of Pixar; and the founder, 
chairman, and CEO of NeXT. Jobs is widely recognized as a pioneer of the personal computer revolution of the 1970s and 1980s, along with his early business partner and fellow Apple co-founder Steve 
Wozniak. 


Seattle (/si'zetal/ see-AT-al) is a seaport city on the West Coast of the United States. It is the seat of King County, Washington. With a 2020 population of 737,015, it is the largest city in both the state of 
Washington and the Pacific Northwest region of North America. The Seattle metropolitan area's population is 4.02 million, making it the 15th-largest in the United States. Its growth rate of 21.1% between 





Showing results 


The previous screenshot shows the results to the query displayed as English text. 


Notice the app prompt “Behind the scenes SPARQL queries” near the bottom of the app window. If 
you click on this field then the SPARQL queries used to answer the question are shown, as on the 
next screenshot: 
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eee KGN 


Query = Info About 


Bill Gates and Melinda Gates and Steve Jobs visited Microsoft in Seattle 


# SPARQL to find all URIs for name: Bill Gates 
SELECT DISTINCT ?person_uri ?comment { 
?person_uri <http://xmins.com/foaf/0.1/name> "Bill Gates"@en . 
OPTIONAL { ?person_uri <http:/Awww.w3.org/2000/01/rdf-schema#comment> 
?comment . FILTER (lang(?comment) = 'en') } . 
}LIMIT 10 
# SPARQL to find all URIs for name: Melinda Gates 
SELECT DISTINCT ?person_uri ?comment { 
?person_uri <http://xmins.com/foaf/0.1/name> "Melinda Gates"@en . 
OPTIONAL { ?person_uri <http://www.w3.org/2000/01/rdf-schema#comment> 
?comment . FILTER (lang(?comment) = 'en') } . 
} LIMIT 10 
# SPARQL to find all URIs for name: Steve Jobs 
SELECT DISTINCT ?person_uri ?comment { 
?person_uri <http://xmins.com/foaf/0.1/name> "Steve Jobs"@en . 
OPTIONAL { ?person_uri <http:/Awww.w3.org/2000/01/rdf-schema#comment> 
?comment . FILTER (lang(?comment) = 'en') } . 


SELECT DISTINCT ?org_uri ?comment { 
?org_uri rdfs:label "Microsoft'@en . 
?org_uri <http:/www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Organization> . 
OPTIONAL { ?org_uri <http:/Awww.w3.org/2000/01/rdf-schema#comment> 
?comment . FILTER (lang(?comment) = 'en') } . 


SELECT DISTINCT ?place_uri ?7comment { 
?place_uri rdfs:label "Seattle"'@en . 
?place_uri <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Place> . 
OPTIONAL { ?place_uri <http://www.w3.org/2000/01/rdf-schema#comment 
?comment . FILTER (lang(?comment) = 'en') } . 
} LIMIT 10 





Showing SPARQL queries used to gather data 


Application Code Listings 


I will list some of the code for this example application and I suggest that you, dear reader, also open 
this project in Xcode in order to navigate the sample code and more carefully read through it. 


SPARQL 


I introduced you to the use of SPARQL in the last chapter. This library can be used by adding 
a reference to the Project.swift file for this project. You can also clone the GitHub repository 
https://github.com/mark-watson/Nlp_swift®* to have the source code for local viewing and modi- 
fication and I have copied the code into the KGN project. 


The file SparqlQuery.swift is shown here: 





°*https://github.com/mark-watson/NIp_swift 
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import Foundation 


public func sparqlDbPedia(query: String) -> Array<Dictionary<String,String>> { 
return SparglEndpointHelpter(query: query, 
endPointUri: "https: //dbpedia.org/sparql ?query=") } 


public func sparqlWikidata(query: String) -> Array<Dictionary<String,String>> { 
return SparglEndpointHelpter(query: query, 
endPointuUri: 
"https: //query.wikidata.org/bigdata/namespace/wdg/sparql ?query=") } 


public func SparqlEndpointHelpter(query: String, 
endPointUri: String) -> 
Array<Dictionary<String,String>> { 
var ret = Set<Dictionary<String,String>>(); 
var content = "{}" 


let maybeString = cacheLookupQuery7(key: query) 
if maybeString?.count ?? @> @ { 
content = maybeString ?? "" 
} else { 
let requestUrl = URL(string: String(endPointUri + query.addingPercentEncodin\ 
g(withAl lowedCharacters: 
.urlHostAllowed)!) + "&format=json") ! 
do { content = try String(contentsOf: requestUrl) } 
catch let error { print(error) } 
} 
let json = try? JSONSerialization. jsonObject(with: Data(content.utf8), 
options: []) 
if let json2 = json as! Optional<Dictionary<String, Any?>> { 
if let head = json2["head"] as? Dictionary<String, Any> { 
if let xvars = head["vars"] as! NSArray? { 
if let results = json2["results"] as? Dictionary<String, Any> { 
if let bindings = results["bindings"] as! NSArray? { 
if bindings.count > @ { 
for i in Q...(bindings.count-1) { 
if let first_binding = 
bindings[i] as? Dictionary<String, 
Dictionary<String,String>> { 
var ret2 = Dictionary<String,String>(); 
for key in xvars { 
let key2 : String = key as! String 
if let vals = (first_binding[key2]) { 
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let vv : String = vals["value"] ?? "err2" 
ret2[key2] = wy } } 
if ret2.count > @ { 
ret. insert(ret2) 


}}FF}IH4} 


return Array(ret) } 


The file QueryCache.swift contains code written by Khoa Pham (MIT License) that can be found 
in the GitHub repository https://github.com/onmyway133/EasyStash®’. This file is used to cache 
SPARQL queries and the results. In testing this application I noticed that there were many repeated 
queries to DBPedia so I decided to cache results. Here is the simple API I added on top of Khoa 
Pham’s code: 


// Created by khoa on 27/05/2019. 

// Copyright © 2019 Khoa Pham. All rights reserved. MIT License. 
// https: //github.com/onmyway133/EasyStash 

// 


import Foundation 
// Mark's simple wrapper: 
var storage: Storage? = nil 


public func cacheStoreQuery(key: String, value: String) { 
do { try storage?.save(object: value, forKey: key) } catch {} 
} 
public func cacheLookupQuery7(key: String) -> String? { 
// optional DEBUG code: clear cache 
//do { try storage?.removeAl1l() } catch { print( "ERROR CLEARING CACHE") } 
do { 
return try storage?.load(forKey: key, as: String.self) 
} catch { return "" } 


// remaining code not shown for brevity. 


The code in file GenerateSparq].swift is used to generate queries for DBPedia. The line-wrapping 
for embedded SPARQL queries in the next code section is difficult to read so you may want to open 
the source file in Xcode. Please note that the KGN application prints out the SPARQL queries used to 
fetch information from DBPedia. The embedded SPARQL query templates used here have variable 
slots that filled in at runtime to customize the queries. 





*"https://github.com/onmyway133/EasyStash 
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LF 

// GenerateSparq!.swift 

//  KGNbeta1 

// 

// Created by Mark Watson on 2/28/20. 

// Copyright © 2021 Mark Watson. All rights reserved. 
// 


import Foundation 


public func uri_to_display_text(uri: String) 
-> String { 
return uri.replacingOccurrences(of: "http://dbpedia.org/resource/Category/", 
with: ""). 
replacingOccurrences(of: "http://dbpedia.org/resource/", 
with: ""). 
replacingOccurrences(of: "_", with: " ") 


public func get_SPARQL_for_finding_URIs_for_PERSON_NAME(nameString: String) 
-> String { 
return 
"# SPARQL to find all URIs for name: " + 
nameString + "\nSELECT DISTINCT ?person_uri ?comment {\n" + 
" ?person_uri <http://xmlns.com/foaf/0.1/name> \"" + 
nameString + "\"@en .\n" + 
" OPTIONAL { ?person_uri <http://www.w3.org/2000/01/rdf-schema#comment>\n" + 
e ?comment . FILTER (lang(?comment) = 'en') } .\n" + 


"} LIMIT 41@\n" 


public func get_SPARQL_for_PERSON_URI(aURI: String) -> String { 
return 

"# <" + QURI + ">\nSELECT DISTINCT ?comment (GROUP_CONCAT(DISTINCT ?birthpla\ 
ce; SEPARATOR=' | ') AS ?birthplace)\n (GROUP_CONCAT(DISTINCT ?almamater; SEPARATOR\ 
=' | ') AS ?almamater) (GROUP_CONCAT(DISTINCT ?spouse; SEPARATOR=' | ') AS ?spouse) \ 
{\n" + 

"  <" + QURI + "> <http://www.w3.org/2000/01/rdf-schema#comment> ?comment .\ 
FILTER (lang(?comment) = 'en') .\n" + 

"OPTIONAL { <" + aURI + "> <http://dbpedia.org/ontology/birthPlace> ?birth\ 
place } .\n" + 

"OPTIONAL { <" + aURI + "> <http://dbpedia.org/ontology/almaMater> ?almama\ 
ter } .\n" + 
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"OPTIONAL { <" + aURI + "> <http://dbpedia.org/ontology/spouse> ?spouse } \ 
Nal + 
") LIMIT 5\n" 


public func get_display_text_for_PERSON_URI(personURI: String) -> [String] { 
var ret: String = "\(uri_to_display_text(uri: personURI) )\n\n" 
let person_details_spargl = get_SPARQL_for_PERSON_URI(aURI: personURI) 
let person_details = sparqlDbPedia(query: person_details_sparq]l ) 


for pd in person_details { 
//let comment = pd["comment"] 
ret.append("\(pd["comment"] ?? "")\n\n") 
let subject_uris = pd["subject_uris" | 
let uri_list: [String] = subject_uris?.components(separatedBy: " | ") ?? [] 
//ret.append("<ul>\n") 
for u in uri_list { 
let subject = uri_to_display_text(uri: u) 
ret.append('"\(subject)\n") } 
//ret.append("</ul>\n") 
if let spouse = pd["spouse"] { 
if spouse.count > @ { 
ret.append("Spouse: \(uri_to_display_text(uri: spouse))\n") } } 
if let almamater = pd["almamater"] { 
if almamater.count > @ { 
ret.append("Almamater: \(uri_to_display_text(uri: almamater))\n") } } 
if let birthplace = pd["birthplace"] { 
if birthplace.count > @ { 
ret.append("Birthplace: \(uri_to_display_text(uri: birthplace))\n") \ 
} } 
} 


return ["# SPARQL for a specific person:\n" + person_details_sparql, ret] 


Lf " ?place_uri <http://xmIns.com/foaf/@.1/name> \"" + placeString + "\"@en . \n\ 


public func get_SPARQL_for_finding_URIs_for_PLACE_NAME(placeString: String) 
-> String { 
return 
"# " + placeString + "\nSELECT DISTINCT ?place_uri ?comment {\n" + 


?place_uri rdfs:label \"" + placeString + "\"@en .\n" + 
" ?place_uri <http://www.w38.org/1999/02/22-rdf-syntax-ns#type> <http://sche\ 
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ma.org/Place> .\n" + 


"OPTIONAL { ?place_uri <http://www.w3.org/2000/01/rdf-schema#comment>\n" + 
: ?comment . FILTER (lang(?comment) = 'en') } .\n" + 
"\ LIMIT 10\n" 
} 
public func get_SPARQL_for_PLACE_URI(aURI: String) -> String { 
return 
"# <" + QURI + ">\nSELECT DISTINCT ?comment (GROUP_CONCAT(DISTINCT ?subject_\ 
uris; SEPARATOR=' | ') AS ?subject_uris) {\n" + 
"  <" + QURI + "> <http://www.w3.org/2000/01/rdf-schema#comment> ?comment .\ 
FILTER (lang(?comment) = 'en') .\n" + 
"OPTIONAL { <" + aURI + "> <http://purl.org/dc/terms/subject> ?subject_uri\ 
Ss } ohn + 


"} (LOMIT S\n" 


public func get_HTML_for_place_URI(placeURI: String) -> String { 
var ret: String = "<h2>" + placeURI + "</h2>\n" 
let place_details_sparql = get_SPARQL_for_PLACE_URI(aURI: placeURI ) 
let place_details = sparqlDbPedia(query: place_details_sparql ) 


for pd in place_details { 
//let comment = pd["comment"] 
ret.append(""<p><strong>\(pd["comment"] ?? "")</strong></p>\n") 
let subject_uris = pd["subject_uris" | 
let uri_list: [String] = subject_uris?.components(separatedBy: " | ") ?? [] 
ret.append("<ul>\n") 
for u in uri_list { 
let subject = u.replacingOccurrences(of: "http: //dbpedia.org/resource/Ca\ 
tegory:", with: "" 
"_"" with: " ") 
ret.append(" <1i>\(subject)</1li>\n") 


).replacingOccurrences(of: "_", with: " ").replacingOccurrences(of\ 


} 
ret.append("</ul>\n") 
} 
return ret 
} 
public func get_SPARQL_for_finding_URIs_for_ORGANIZATION_NAME(orgString: String) -> \ 
String { 
return 


"# "+ orgString + "\nSELECT DISTINCT ?0rg_uri ?comment {\n" + 
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?org_uri rdfs:label \"" + orgString + "\"@en .\n" + 


61 


" ?org_uri <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema\ 


.org/Organization> .\n" + 


"OPTIONAL { ?0rg_uri <http://www.w3.org/2000/01/rdf-schema#comment>\n" + 


?comment . FILTER (lang(?comment) = 'en') } .\n" + 
"\ LIMIT 2\n" 


The file AppSparql contains more utility functions for getting entity and relationship data from 


DBPedia: 


// AppSparql.swift 
// Created by ML Watson on 7/18/21. 


import Foundation 

let detailSpargl = """ 

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema*#> 

select ?entity ?label ?description ?comment where { 
?entity rdfs:label "<name>"@en . 


?entity schema:description ?description . filter (lang(?description) = 'en') 


ilter(!regex(?description,"Wikimedia disambiguation page")) . 
} limit 5000 


let personSparql = 
select ?uri ?comment { 
Puri <http://xmlns.com/foaf/@.1/name> "<name>"@en . 
Puri <http://www.w3.org/2000/01/rdf-schema#comment> ?comment . 
FILTER (lang(?comment) = 'en') . 
} 


let personDetailSpargql = 
SELECT DISTINCT ?label ?comment 


(GROUP_CONCAT (DISTINCT ?birthplace; SEPARATOR=' | ') AS ?birthplace) 
(GROUP_CONCAT (DISTINCT ?almamater; SEPARATOR=' | ') AS ?almamater ) 
(GROUP_CONCAT (DISTINCT ?spouse; SEPARATOR=' | ') AS ?spouse) { 


<name> <http://www.w3.org/2000/01/rdf-schema#comment> ?comment . 
FILTER (lang(?comment) = 'en') . 


OPTIONAL { <name> <http://dbpedia.org/ontology/birthPlace> ?birthplace } . 


cx 


39 





NK 
Oo AN ona Ft WN KF OD 


Example Application: iOS and macOS Versions of my KnowledgeBookNavigator 62 


OPTIONAL { <name> <http://dbpedia.org/ontology/almaMater> ?almamater } 
OPTIONAL { <name> <http://dbpedia.org/ontology/spouse> ?spouse } . 
OPTIONAL { <name> <http://www.w3.org/2000/@1/rdf-schema#label> ?label 
FILTER (lang(?label) = 'en') } 
} LIMIT 10 


let placeSparql = """ 
SELECT DISTINCT ?uri ?comment WHERE { 
?uri rdfs:label "<name>"@en 
Puri <http: //www.w3.org/2000/01/rdf-schema#comment> ?comment . 
FILTER (lang(?comment) = 'en') 
?place <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Place\ 
y 4 
\} LIMIT 80 


let organizationSparql = """ 
SELECT DISTINCT ?uri ?comment WHERE { 
?uri rdfs:label "<name>"@en 
Puri <http: //www.w3.org/2000/01/rdf-schema#comment> ?comment . 
FILTER (lang(?comment) = 'en') 
?uri <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Organiz\ 
ation> 
} LIMIT 80 


fune entityDetail(name: String) -> [Dictionary<String,String>] { 
var ret: [Dictionary<String,String>] = [] 
let sparql = detailSparql.replacingOccurrences(of: "<name>", with: name) 
print(sparql) 
let r = sparqlDbPedia(query: sparql ) 
r.forEach { result in 
print(result) 
ret .append(result) 
} 


return ret 


func personDetail(name: String) -> [Dictionary<String,String>] { 
var ret: [Dictionary<String,String>] = [] 
let sparql = personSpargl.replacingOccurrences(of: "<name>", with: name) 
print(sparql) 
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let r = sparqlDbPedia(query: sparql) 
r.forEach { result in 

print(result) 

ret .append(result) 
} 


return ret 


func placeDetail(name: String) -> [Dictionary<String,String>] { 
var ret: [Dictionary<String,String>] = [] 
let sparql = placeSparql.replacingOccurrences(of: "<name>", with: name) 
print(sparql) 
let r = sparqlDbPedia(query: sparql ) 
r.forEach { result in 
print(result) 
ret .append(result) 
} 


return ret 


func organizationDetail(name: String) -> [Dictionary<String,String>] { 
var ret: [Dictionary<String,String>] = [] 
let sparql = organizationSparql.replacingOccurrences(of: "<name>", with: name) 
print(sparql) 
let r = sparqlDbPedia(query: sparql ) 
r.forEach { result in 
print(result) 
ret .append(result) 
} 


return ret 


public func processEntities(inputString: String) -> [(name: String, type: String, ur\ 
i: String, comment: String)] { 
let entities = getEntities(text: inputString) 
var augmentedEntities: [(name: String, type: String, uri: String, comment: Strin\ 
g)] = [] 
for (entityName, entityType) in entities { 
print("** entityName:", entityName, "“entityType:", entityType) 
if entityType == "PersonalName" { 
let data = personDetail(name: entityName) 
for d in data { 
augmentedEntities.append((name: entityName, type: entityType, 
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uri: "<" + d["uri"]! + ">", comment: "<" + d["comment"]! + ">")) 
} 
} 
if entityType == "OrganizationName" { 
let data = organizationDetail(name: entityName) 
for d in data { 
augmentedEntities.append((name: entityName, type: entityType, 
uri: "<" + d["uri"]! + ">", comment: "<" + d["comment"]! + ">")) 
} 
‘ 
if entityType == "PlaceName" { 
let data = placeDetail(name: entityName) 
for d in data { 
augmentedEntities.append((name: entityName, type: entityType, 
uri: "<" + d["uri"]! + ">", comment: "<" + d["comment"]! + ">")) 
} 
} 
} 
return augmentedEntities 
} 
extension Array where Element: Hashable { 
func uniqueValuesHelper() -> [Element] { 
var addedDict = [Element: Bool]() 
return filter { addedDict.updateValue(true, forKey: $@) == nil } 
} 
mutating func uniqueValues() { 
self = self.uniqueValuesHelper() 
} 
} 


func getAllRelationships(inputString: String) -> [String] { 
let augmentedEntities = processEntities(inputString: inputString) 
var relationshipTriples: [String] = [] 
for aet in augmentedEntities { 
for ae2 in augmentedEntities { 
if aet != ae2 { 
let ert = dbpediaGetRelationships(entity1Uri: aet.uri, 
entity2Uri: ae2.uri) 
relationshipTriples.append(contentsOf: er1) 
let er2 = dbpediaGetRelationships(entity1Uri: ae2.uri, 
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entity2Uri: aet.uri) 
relationshipTriples.append(contentsOf: er2) 
} 
} 

} 

relationshipTriples.uniqueValues( ) 

relationshipTriples.sort() 

return relationshipTriples 
} 
AppleBERT 


The files in the directory AppleBERT were copied from Apple’s example https://developer.apple.com/documentation 
integration_samples/finding_answers_to_questions_in_a_text_document*®* with a few changes to 

get returned results in a convenient format for this application. Apple’s BERT documentation is 

excellent and you should review it. 


Relationships 


The file Relationships.swift fetches relationship data for pairs of DBPedia entities. Note that the 
first SPARQL template has variable slots <e1> and <e2> that are replaced at runtime with URIs 
representing the entities that we are searching for relationships between these two entities: 


// relationships between DBPedia entities 
let relSparql = """ 
SELECT DISTINCT ?p {<e1> ?p <e2> .FILTER (!regex(str(?p), 'wikiPage', 'i'))} LIMIT 5 


public func dbpediaGetRelationships(entity1Uri: String, entity2Uri: String) 
-> [String] { 
var ret: [String] = [] 
let sparql1 = relSparql.replacingOccurrences(of: "<e1>", 
with: entity1Uri).replacingOccurrences(of: "<e2>", 
with: entity2Uri ) 
let r1 = sparqlDbPedia(query: sparql1) 
r1.forEach { result in 
if let relName = result["p"] { 
let rdfStatement = entity1Uri + " <" + relName + "> " + entity2Uri +" ." 
print(rdfStatement ) 





*8https://developer.apple.com/documentation/coreml/model_integration_samples/finding_answers_to questions in_a_text_document 
Pp per.app. g p. 8. q 
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ret .append(rdfStatement ) 
} 
} 
let sparql2 = relSpargl.replacingOccurrences(of: "<e1>", 
with: entity2Uri).replacingOccurrences(of: "<e2>", 
with: entity1Uri) 
let r2 = sparqlDbPedia(query: sparq12) 
r2.forEach { result in 
if let relName = result["p"] { 
let rdfStatement = entity2Uri + " <" + relName + "> " + entity1Uri + " 
print(rdfStatement ) 
ret .append(rdfStatement ) 
} 
} 
return Array(Set(ret)) 
} 
public func uriToPrintName(_ uri: String) -> String { 


let slashIndex = uri.lastIndex(of: "/") 

if slashIndex == nil { return uri } 

var s = uri[slashIndex!...] 

s = s.dropFirst() 

if s.count > @ { s.removeLast() } 

return String(s).replacingOccurrences(of: "_", with: " ") 


public func relationshipsoEnglish(rs: [String]) -> String { 
var lines: [String] = [] 
for r inrs { 


let triples = r.split(separator: , maxSplits: 3, 
omittingEmptySubsequences: true) 


if triples.count > 2 { 


lines .append(uriToPrintName(String(triples[@])) + " "+ 
uriToPrintName(String(triples[1])) + " "+ 
uriToPrintName(String(triples[2] ))) 
} else { 


lines. append(r) 


} 
let linesNoDuplicates = Set(lines) 
return linesNoDuplicates. joined(separator: "\n") 
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NLP 


The file NlpWhiteboard provides high level NLP utility functions for the application: 


// 
// 
// 
// 
ti: 
id 


NlpWhiteboard. swift 
KGN 


Copyright © 2021 Mark Watson. All rights reserved. 


public struct NlpWhiteboard { 


ns 


var originalText: String = "" 
var people: [String] = [] 

var places: [String] = [] 

var organizations: [String] = [] 


var sparql: String = 


init() { } 


mutating func set_text(originalText: String) { 
self.originalText = originalText 
let (people, places, organizations) = getAllEntities(text: originalText) 
self.people = people; self.places = places; self.organizations = organizatio\ 


mutating func query_to_choices(behindTheScenesSpargqlText: inout String) 
-> [[[String]]] { // return inner: [comment, uri] 
var ret: Set<[[String]]> = [] 
if people.count > @ { 
for i in Q...(people.count - 1) { 
self.sparql = 
get_SPARQL_for_finding_URIs_for_PERSON_NAME(nameString: people[i]) 
behindTheScenesSparq|lText += self.sparql 
let results = sparqlDbPedia(query: self.sparql ) 
if results.count > @ { 
ret.insert( results.map { [($@["comment"] 
cz eas F 
($0["person_uri"] ?? "")] }) 
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} 
if organizations.count > @ { 
for i in Q...(organizations.count - 1) { 


self.sparql = get_SPARQL_for_finding_URIs_for_ORGANIZATION_NAME ( 


orgString: organizations[i] ) 
behindTheScenesSparqlText += self.sparql 
let results = sparqlDbPedia(query: self.sparql) 
if results.count > @ { 
ret.insert(results.map { [($@["comment"] ?? 
wy (ee Moreen) ee Ol) TD 


} 
if places.count > @ { 
for i in 0...(places.count - 1) { 
self.sparql = get_SPARQL_for_finding_URIs_for_PLACE_NAME ( 
placeString: places[i]) 
behindTheScenesSparqlText += self.sparql 
let results = sparqlDbPedia(query: self.sparql) 
if results.count > @ { 
ret.insert( results.map { [($@["comment"] ?? 
"")) ($0["place_uri"] ?? "")] }) 


} 
//print("\n\nt++++++ ret: \n", ret, "\n\n") 


return Array(ret) 


The file NLPutils.swift provides lower level NLP utilities: 


i 
// 
ie 
// 
// 


NLPutils.swift 
KGN 


Copyright © 2021 Mark Watson. All rights reserved. 


import Foundation 


import NaturalLanguage 


public func getPersonDescription(personName: String) -> [String] { 


let sparql = get_SPARQL_for_finding_URIs_for_PERSON_NAME(nameString: 


personName ) 
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let results = sparqlDbPedia(query: sparql) 
return [sparql, results.map { 
($0["comment"] ?? $@["abstract"] ?? "") }.joined(separator: ". ")] 


public func getPlaceDescription(placeName: String) -> [String] { 
let sparql = get_SPARQL_for_finding_URIs_for_PLACE_NAME(placeString: placeName) 
let results = sparqlDbPedia(query: sparql) 
return [sparql, results.map { ($@["comment"] ?? 
$0["abstract"] ?? "") }.joined(separator: " . ")] 


public func getOrganizationDescription(organizationName: String) -> [String] { 
let sparql = get_SPARQL_for_finding_URIs_for_ORGANIZATION_NAME( 
orgString: organizationName ) 
let results = sparqlDbPedia(query: sparql) 
print( "=== getOrganizationDescription results =\n", results) 
return [spargl, results.map { ($0["comment"] ?? $@["abstract"] ?? "") } 
.joined(separator: ". ")] 


let tokenizer = NLTokenizer(unit: .word) 
let tagger = NSLinguisticTagger(tagSchemes:[.tokenType, .language, .lexicalClass, 
.nameType, .lemma], options: @) 
let options: NSLinguisticTagger.Options = 
[.omitPunctuation, .omitWhitespace, .joinNames ] 


let tokenizerOptions: NSLinguisticTagger.Options = 
[.omitPunctuation, .omitWhitespace, .joinNames ] 


public func getEntities(text: String) -> [(String, String)] { 
var words: [(String, String)] = [] 
tagger.string = text 
let range = NSRange(location: @, length: text.utf16.count) 
tagger .enumerateTags(in: range, unit: .word, 
scheme: .nameType, options: options) { tag, tokenRange, stop in 
let word = (text as NSString).substring(with: tokenRange) 
let tagType = tag?.rawValue ?? "unkown" 
if tagType != "unkown" && tagType != "OtherWord" { 
words .append((word, tagType) ) 
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return words 


public func tokenizeText(text: String) -> [String] { 
var tokens: [String] = [] 
tokenizer.string = text 


70 


tokenizer .enumerateTokens(in: text.startIndex..<text.endIndex) { tokenRange, _ in 


tokens. append(String(text[tokenRange] ) ) 
return true 


} 


return tokens 


let entityTagger = NLTagger(tagSchemes: [.nameType] ) 


let entityOptions: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames] 


let entityTagTypess: [NLTag] = [.personalName, .placeName, 


.organizationName] 


public func getAllEntities(text: String) -> ([String], [String], [String]) { 


var words: [(String, String)] = [] 
[] 
[] 


var organizations: [String] = [] 


var people: [String] 


var places: [String] 


entityTagger.string = text 


entityTagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, 


scheme: .nameType, options: entityOptions) { tag, tokenRange in 


if let tag = tag, entityTagTypess.contains(tag) { 

let word = String(text[tokenRange] ) 

if tag.rawValue == "PersonalName" { 
people. append(word) 

} else if tag.rawValue == "PlaceName" { 
places.append(word) 

} else if tag.rawValue == "OrganizationName" { 
organizations. append(word) 

} else { 


print("\nERROR: unkown entity type: |\(tag.rawValue) |") 


} 


words.append((word, tag.rawValue)) 


} 


return true 


} 


return (people, places, organizations) 
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func splitLongStrings(_ s: String, limit: Int) -> String { 
var ret: [String] = [] 
let tokens = s.split(separator: " ") 
var subLine = "" 
for token in tokens { 
if subLine.count > limit { 
ret.append(subLine) 
subLine = "" 
} else { 
subLine = subLine + " " + token 


} 
if subLine.count > @ { 
ret .append(subLine) 


} 

return ret. joined(separator: "\n") 
} 
Views 


This is not a book about SwiftUI programming, and indeed I expect many of you dear readers know 
much more about UI development with SwiftUI than I do. Iam not going to list the four view files: 


- MainView.swift 
¢ QueryView.swift 
- AboutView.swift 
¢ InfoView.swift 


Main KGN 


The top level app code in the file KGNApp.swift is fairly simple. I hardcoded the window size for 
macOS and the window sizes for running this example on iPadOS or iOS are commented out: 
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import SwiftUI 


@main 
struct KGNApp: App { 
var body: some Scene { 
WindowGroup { 
MainView() 
.frame(width: 1200, height: 770) // << here !! 
//.frame(width: 660, height: 77@) // << here !! 
//..frame(width: 500, height: 8@Q@) // << here !! 


I was impressed by the SwiftUI framework. Applications are fairly portable across macOS, iOS, and 
iPadOS. I am not a UI developer by profession (as this application shows) but I enjoyed learning just 
enough about SwiftUI to write this example application. 


Book Wrap Up 


I hope that you dear reader enjoyed this short book. While I enjoy programming in Swift and 
appreciate how well Apple has integrated machine learning capabilities in their iOS/iPadOS/macOS 
ecosystems, I still find myself writing most of my experimental code in Lisp languages and using 
Python for deep learning experiments and projects. That said, I am very happy that I have done the 
work to add Swift, CoreML, and SwiftUI to my personal programming tool belt. 


I usually update my eBooks so if there is some topic or application domain that you would like 
added to future versions of this book, then please let me know. My email address is markw <at> 
markwatson <dot> com. 


